Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] cudf uses deprecated Series.label_encoding function internally #9474

Closed
vyasr opened this issue Oct 19, 2021 · 6 comments · Fixed by #12040
Closed

[BUG] cudf uses deprecated Series.label_encoding function internally #9474

vyasr opened this issue Oct 19, 2021 · 6 comments · Fixed by #12040
Assignees
Labels
Python Affects Python cuDF API.

Comments

@vyasr
Copy link
Contributor

vyasr commented Oct 19, 2021

There are multiple places where cuDF internally calls the Series.label_encoding method, which was deprecated in #9289:

The first two should be rewritten to not use the deprecated function, while the third should also be deprecated.

@shwina shwina added Python Affects Python cuDF API. tech debt labels Oct 20, 2021
@bdice
Copy link
Contributor

bdice commented Oct 27, 2021

I saw @shwina added this to the 22.02 board. We should probably put in a temporary workaround to avoid showing FutureWarnings in the 21.12 release when using categorical columns or the factorize method. I'd suggest making a private function that performs the work, replace internal usage with that private function, and let the public function for label_encoding show the deprecation warning and call the private method.

@bdice
Copy link
Contributor

bdice commented Oct 27, 2021

I'm opening a PR to deprecate DataFrame.label_encoding as noted in (3) above, and also call a private _label_encoding method for cases (1) and (2). This should eliminate warnings from user code and tests except when calling the deprecated label_encoding method directly on a Series or DataFrame.

After that PR, we will still need to review whether the internal method can be simplified or removed after its public API is removed.

rapids-bot bot pushed a commit that referenced this issue Oct 29, 2021
…d internally. (#9535)

This PR addresses part of #9474 by deprecating `DataFrame.label_encoding` and adding a private method that can be used internally without raising a `FutureWarning` until the public API is removed and internals are refactored. See #9474 for more information.

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)
  - GALI PREM SAGAR (https://github.com/galipremsagar)

URL: #9535
@beckernick beckernick added this to the CuDF Python Refactoring milestone Nov 12, 2021
@github-actions
Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@shwina
Copy link
Contributor

shwina commented Jan 20, 2022

Closing this as it looks like #9535 addressed it. Please feel free to reopen if I missed something.

@shwina shwina closed this as completed Jan 20, 2022
@vyasr
Copy link
Contributor Author

vyasr commented Jan 24, 2022

This isn't quite addressed yet because the three internal uses of the now-internal Series._label_encoding are still present. #9535 simply added an extra level of indirection so that those would not also throw warnings because those functions are not deprecated but they were relying on the now-deprecated label_encoding function. I'm removing DataFrame.label_encoding now, which addresses one of the uses documented in this issue, but we still need to investigate the other two.

@vyasr vyasr reopened this Jan 24, 2022
@github-actions
Copy link

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

@shwina shwina self-assigned this Oct 27, 2022
rapids-bot bot pushed a commit that referenced this issue Nov 30, 2022
Maybe closes #9474.

The `Series._label_encoding` function is used to integer encode columns. It makes much more sense for this to be a method of `Column` rather than `Series` (the index is irrelevant). Thus, I've moved the implementation to `Column`,

Authors:
  - Ashwin Srinath (https://github.com/shwina)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #12040
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants