Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Be consistent in handling of default parameter values in pylibcudf #15198

Open
vyasr opened this issue Feb 29, 2024 · 4 comments
Open

[FEA] Be consistent in handling of default parameter values in pylibcudf #15198

vyasr opened this issue Feb 29, 2024 · 4 comments
Labels
feature request New feature or request pylibcudf Issues specific to the pylibcudf package

Comments

@vyasr
Copy link
Contributor

vyasr commented Feb 29, 2024

Is your feature request related to a problem? Please describe.
Currently some pylibcudf APIs have default values for parameters while others do not. Defaults are a double-edged sword because while they are convenient, the underlying libcudf APIs also have defaults and the two could diverge.

Describe the solution you'd like
pylibcudf APIs should decide whether they offer defaults. The two options would be to either offer no defaults and require users to provide all parameters, or offer the same defaults as libcudf, in which case it would be the responsibility of pylibcudf devs to keep those defaults in sync with libcudf.

Describe alternatives you've considered
There isn't really any cost to the defaults in pylibcudf diverging from those in libcudf, so as long as they're documented it might be fine to have defaults for user convenience and not be overly concerned about ensuring that they remain identical to libcudf's in perpetuity.

@wence-
Copy link
Contributor

wence- commented Mar 4, 2024

I think there should be no default parameters, to force the python-side caller to be fully explicit about stream and memory management.

My rationale is that since RMM objects are (deliberately) not smart pointers, there is no way to safely take ownership of an rmm::device_buffer from Python. Specifically, if I am passed a device_buffer whose memory resource is of a provenance I do not control, I cannot guarantee that I keep the memory resource alive for the lifetime of the buffer.

Indeed, in RMM, we don't even try!

https://github.com/rapidsai/rmm/blob/f132d4b0daa976e1ec6cbcef24f5454fe510a394/python/rmm/_lib/device_buffer.pyx#L160-L171

We just do:

        buf.c_obj = move(ptr)
        buf.mr = get_current_device_resource()

and fingers crossed, ptr.mr is the same as get_current_device_resource().

There's no way to work around this, so really the only safe way is to document in RMM that if taking ownership of a device_buffer from Python, one must only do so when the buffer has been allocated via a memory resource that the python process controls. In libcudf, we can always pass a memory resource in to any allocating routines, so if we do that from the python side with a memory resource whose lifetime we control, we can match the required contract and ensure that we do things safely.

Right now, everything works through happenstance.

@wence-
Copy link
Contributor

wence- commented Mar 4, 2024

I've opened rapidsai/rmm#1492 to clarify the requirements in docs on the RMM side.

@vyasr
Copy link
Contributor Author

vyasr commented Mar 4, 2024

Right, the mr discussion is in #14229.

Playing devil's advocate: while the pitfalls of default mrs are clear, that also makes the overall API very cumbersome when you have a libcudf API with 7 parameters (including stream/mr) of which 5 are defaulted because there are sensible choices. This would force users to also provide the other 3 in addition to stream/mr.

If we restrict the "no defaults" choice to only APIs then it would still allow us to make some improvements like those suggested in #15130, but if we disallowed defaults everywhere it could have additional impacts on how much syntactic sugar we can add to the pylibcudf API in a performant manner.

@wence-
Copy link
Contributor

wence- commented Mar 5, 2024

I think I am advocating only that the wrapping of the libcudf API provides no defaults. I would alternately be happy if the cpdef functions took keyword defaulted arguments for stream and mr (so that one can provide, say, the mr but not the stream if one wants)

@vyasr vyasr added the pylibcudf Issues specific to the pylibcudf package label May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request pylibcudf Issues specific to the pylibcudf package
Projects
Status: Todo
Development

No branches or pull requests

2 participants