Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Deprecate Series.as_mask #9828

Open
bdice opened this issue Dec 2, 2021 · 6 comments
Open

[FEA] Deprecate Series.as_mask #9828

bdice opened this issue Dec 2, 2021 · 6 comments
Labels
feature request New feature or request good first issue Good for newcomers Python Affects Python cuDF API.

Comments

@bdice
Copy link
Contributor

bdice commented Dec 2, 2021

Is your feature request related to a problem? Please describe.
Doctests (#9815) show the following doctest failing:

>>> import cudf
>>> s = cudf.Series([True, False, True])
>>> s.as_mask()
<cudf.core.buffer.Buffer object at 0x7f23c3eed0d0>
>>> s.as_mask().to_host_array()
array([ 5, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 181, 164,
188, 1, 0, 0, 0, 0, 255, 255, 255, 255, 255, 255, 255,
127, 253, 214, 62, 241, 1, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
dtype=uint8)

Describe the solution you'd like
From a preliminary discussion with @vyasr, we probably want to deprecate and remove as_mask from the public API.

In the short term, I am removing the doctests for Series.as_mask().to_host_array() in #9815 (f881890) because it's somewhat untestable and not helpful to demonstrate to users if we deprecate and remove the feature. It appears that the buffer has a size of 64 bytes. I assume that's the smallest allocation RMM can give, but that's not obvious to users who look at the output data of the doctest. Furthermore, the doctest shows uninitialized (garbage) data in the output of to_host_array() which is misleading.

@bdice bdice added feature request New feature or request Python Affects Python cuDF API. deprecation labels Dec 2, 2021
@bdice bdice mentioned this issue Dec 2, 2021
9 tasks
@vyasr
Copy link
Contributor

vyasr commented Dec 3, 2021

I think along with this we should think about what mask-related APIs we really should be exposing at the Python layer. For instance, should the null_mask or nullable properties even be accessible, or are they implementation details? Series.set_mask is already deprecated. Are there any public APIs where a bitmask is a reasonable expectation, or should all public APIs involving masks operate on boolean masks even if that is less performant (that's my opinion, IMO all bitmask operations should be exclusively internal, but perhaps there's a strong performance need to expose those).

@bdice
Copy link
Contributor Author

bdice commented Dec 21, 2021

I agree that many (all?) properties dealing with bitmasks could be considered implementation details for the Python API. The APIs I'm looking at are:

(edit: updated status of APIs)

rapids-bot bot pushed a commit that referenced this issue Jan 15, 2022
This PR adds doctests and resolves #9513. Several issues were found by running doctests that have now been resolved:

- [x] #9821
- [x] #9822
- [x] #9823
- [x] #9824
- [x] #9825
- [x] #9826
- [x] #9827
- [x] #9828 (workaround by deleting doctests)
- [x] #9829

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Ashwin Srinath (https://github.com/shwina)
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #9815
@github-actions
Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

rapids-bot bot pushed a commit that referenced this issue Jan 28, 2022
This PR removes a large number of deprecated code paths in cuDF. This PR resolves #9465 and partially addresses #9828 (this PR does not address any mask-related API deprecations other than the removal of the already deprecated Series.set_mask).

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Ashwin Srinath (https://github.com/shwina)

URL: #10124
@github-actions
Copy link

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

@vyasr
Copy link
Contributor

vyasr commented Oct 21, 2022

@shwina are you aware of any issues with proceeding here? @bdice's last comment shows what APIs still need to be updated.

@vyasr
Copy link
Contributor

vyasr commented May 13, 2024

The comment above is still accurate. I think we can go ahead and remove these.

@vyasr vyasr added the good first issue Good for newcomers label May 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request good first issue Good for newcomers Python Affects Python cuDF API.
Projects
Status: Todo
Development

No branches or pull requests

3 participants