You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Currently the SpillManager modifies rmm's current device resource on construction to allow itself to detect when allocations fail and trigger spilling. However, this memory resource change is not reverted if the manager goes out of scope. This causes problems in situations where managers may be created and then deleted as was discovered in #14958.
Expected behavior
Either the spill manager should attempt to remove the modification of the mr when it is garbage collected, or the test that was skipped in #14958 should be removed. If we attempt the former, one of the challenges will be that the user could have set a different memory resource themselves.
In fact, this reveals a flaw in the current spilling logic whereby a user could enable spilling but then set the memory resource, which would invalidate the spilling approach. I'm not sure that there is a safe way to handle this right now other than modifying cudf to store a default memory resource that is passed to every algorithm rather than relying on rmm's default memory resource. That way, when spilling is enabled cudf could modify its default memory resource, and users modifying rmm's memory resource would have no effect, while attempting to modify cudf's memory resource would either raise an error or handle doing this in a spilling-safe manner (i.e. by wrapping the new memory resource in the callback adaptor).
This dovetails with some discussions in #14229 around the fact that default memory resources in libcudf's function signatures open us up to some pretty subtle ways because cuDF Python often leverages libcudf and [lib]rmm in nontrivial ways.
The text was updated successfully, but these errors were encountered:
Describe the bug
Currently the SpillManager modifies rmm's current device resource on construction to allow itself to detect when allocations fail and trigger spilling. However, this memory resource change is not reverted if the manager goes out of scope. This causes problems in situations where managers may be created and then deleted as was discovered in #14958.
Expected behavior
Either the spill manager should attempt to remove the modification of the mr when it is garbage collected, or the test that was skipped in #14958 should be removed. If we attempt the former, one of the challenges will be that the user could have set a different memory resource themselves.
In fact, this reveals a flaw in the current spilling logic whereby a user could enable spilling but then set the memory resource, which would invalidate the spilling approach. I'm not sure that there is a safe way to handle this right now other than modifying cudf to store a default memory resource that is passed to every algorithm rather than relying on rmm's default memory resource. That way, when spilling is enabled cudf could modify its default memory resource, and users modifying rmm's memory resource would have no effect, while attempting to modify cudf's memory resource would either raise an error or handle doing this in a spilling-safe manner (i.e. by wrapping the new memory resource in the callback adaptor).
This dovetails with some discussions in #14229 around the fact that default memory resources in libcudf's function signatures open us up to some pretty subtle ways because cuDF Python often leverages libcudf and [lib]rmm in nontrivial ways.
The text was updated successfully, but these errors were encountered: