You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When a test that uses a MemoryResource with an UpstreamResourceAdaptor raises or fails, it results in a segfault in the subsequent test:
# test_segfault.pyimportrmmimportpytestimportgc@pytest.fixture(scope="function", autouse=True)defrmm_auto_reinitialize(request):
# Run the test yield# Automatically reinitialize the current memory resource after running each # test rmm.reinitialize()
deftest_one():
mr=rmm.mr.PoolMemoryResource(rmm.mr.CudaMemoryResource())
rmm.mr.set_current_device_resource(mr)
buf=rmm.DeviceBuffer(size=10)
bl# raises deftest_two():
gc.collect()
pytest test_segfault.py # segfaults
I'm still trying to get to the bottom of this, but I think it has something to do with traceback object corresponding to the error keeping objects alive for longer than expected, and this causing things to be destructed in the wrong order. Will report back when I know more.
The text was updated successfully, but these errors were encountered:
Closes#1169.
Essentially, we are running into the situation described in https://cython.readthedocs.io/en/latest/src/userguide/extension_types.html#disabling-cycle-breaking-tp-clear with `UpstreamResourceAdaptor`.
The solution is to prevent clearing of `UpstreamResourceAdaptor` objects by decorating them with `no_gc_clear`.
Cython calls out the following:
> If you use no_gc_clear, it is important that any given reference cycle contains at least one object without no_gc_clear. Otherwise, the cycle cannot be broken, which is a memory leak.
The other object in RMM that we mark `@no_gc_clear` is `DeviceBuffer`, and a `DeviceBuffer` can keep a reference to an `UpstreamResourceAdaptor`. But, an `UpstreamResourceAdaptor` cannot keep a reference to a `DeviceBuffer`, so instances of the two cannot form a reference cycle AFAICT.
Authors:
- Ashwin Srinath (https://github.com/shwina)
Approvers:
- Vyas Ramasubramani (https://github.com/vyasr)
- Mark Harris (https://github.com/harrism)
URL: #1170
shwina
added a commit
to shwina/rmm
that referenced
this issue
Dec 19, 2022
…idsai#1170)
Closesrapidsai#1169.
Essentially, we are running into the situation described in https://cython.readthedocs.io/en/latest/src/userguide/extension_types.html#disabling-cycle-breaking-tp-clear with `UpstreamResourceAdaptor`.
The solution is to prevent clearing of `UpstreamResourceAdaptor` objects by decorating them with `no_gc_clear`.
Cython calls out the following:
> If you use no_gc_clear, it is important that any given reference cycle contains at least one object without no_gc_clear. Otherwise, the cycle cannot be broken, which is a memory leak.
The other object in RMM that we mark `@no_gc_clear` is `DeviceBuffer`, and a `DeviceBuffer` can keep a reference to an `UpstreamResourceAdaptor`. But, an `UpstreamResourceAdaptor` cannot keep a reference to a `DeviceBuffer`, so instances of the two cannot form a reference cycle AFAICT.
Authors:
- Ashwin Srinath (https://github.com/shwina)
Approvers:
- Vyas Ramasubramani (https://github.com/vyasr)
- Mark Harris (https://github.com/harrism)
URL: rapidsai#1170
When a test that uses a
MemoryResource
with anUpstreamResourceAdaptor
raises or fails, it results in a segfault in the subsequent test:I'm still trying to get to the bottom of this, but I think it has something to do with
traceback
object corresponding to the error keeping objects alive for longer than expected, and this causing things to be destructed in the wrong order. Will report back when I know more.The text was updated successfully, but these errors were encountered: