Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove drop_nan parameter from IndexedFrame._drop_na_rows. #10125

Closed
bdice opened this issue Jan 25, 2022 · 0 comments · Fixed by #10140
Closed

Remove drop_nan parameter from IndexedFrame._drop_na_rows. #10125

bdice opened this issue Jan 25, 2022 · 0 comments · Fixed by #10140
Assignees
Labels
Python Affects Python cuDF API.

Comments

@bdice
Copy link
Contributor

bdice commented Jan 25, 2022

While reviewing #10123, I noticed the drop_nan parameter should be removed from IndexedFrame._drop_na_rows (it should always be treated as True).

drop_nan currently defaults to False but this internal method is only called in one place. In that call, drop_nan is hard-coded as True.

result = self._drop_na_rows(
how=how, subset=subset, thresh=thresh, drop_nan=True
)

Moreover, the corresponding method _drop_na_columns does not have such a flag.

def _drop_na_columns(self, how="any", subset=None, thresh=None):

This change should target 22.04. A hotfix #10123 affecting this code path is going to go into 22.02.

Originally posted by @bdice in #10123 (comment)

@bdice bdice added Python Affects Python cuDF API. non-breaking Non-breaking change tech debt and removed non-breaking Non-breaking change labels Jan 25, 2022
@bdice bdice self-assigned this Jan 25, 2022
@bdice bdice changed the title Remove drop_nan parameter from DataFrame._drop_na_rows. Remove drop_nan parameter from IndexedFrame._drop_na_rows. Jan 25, 2022
rapids-bot bot pushed a commit that referenced this issue Jan 27, 2022
This PR removes the `drop_nan` parameter from the internal API `IndexedFrame._drop_na_rows`. Its behavior was unused internally in cudf (always set to `True` in the public API `IndexedFrame.dropna`). The behavior of `drop_nan=False` was untested until 22.02 hotfix #10123, when an issue was found in gpu-bdb. However, that code can use the public API `df.dropna(axis=0)` instead. See rapidsai/gpu-bdb#228.

This is marked as a non-breaking change because it only affects internal APIs.

Resolves #10125, follows up on #10123.

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - GALI PREM SAGAR (https://github.com/galipremsagar)
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #10140
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant