Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix sort_values when column is all empty strings #12988

Merged
merged 2 commits into from
Mar 22, 2023

Conversation

eriknw
Copy link
Contributor

@eriknw eriknw commented Mar 21, 2023

Description

See test for simple MRE.

This fixes rapidsai/cugraph#3058

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@eriknw eriknw requested a review from a team as a code owner March 21, 2023 23:27
@github-actions github-actions bot added the Python Affects Python cuDF API. label Mar 21, 2023
if last := divisions[col].iloc[-1]:
divisions[col].iloc[-1] = chr(ord(last[0]) + 1)
else:
divisions[col].iloc[-1] = chr(1) # b/c "" < chr(1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, using chr(0) here doesn't work, because drop_duplicates below drops it by treating "" and chr(0) the same, even though "" < chr(0). Do we care that chr(1) is not printable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually a bug in [i]loc.__setitem__:

import cudf
x = cudf.Series(["", chr(0)])
x.drop_duplicates() == x # True
y = cudf.Series(["a", "b"])
y.iloc[0] = ""
y.iloc[1] = chr(0)
y.iloc[0] == y.iloc[1] # True

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess in the case that last is the empty string, it suffices to provide any non-empty string as the upper bound on division?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it suffices to provide any non-empty string

Yup. How about "wence was here"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"nonempty"? I don't like to leave footprints :).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"this string intentionally left empty"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like David's suggestion (or we can wait for #12991 and use chr(0))

@wence- wence- added non-breaking Non-breaking change bug Something isn't working dask Dask issue labels Mar 22, 2023
@wence-
Copy link
Contributor

wence- commented Mar 22, 2023

/merge

@rapids-bot rapids-bot bot merged commit 0d1fb96 into rapidsai:branch-23.04 Mar 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working dask Dask issue non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FIX]: MGPropertyGraph renumber_vertices_by_type fails when type is unspecified ('')
3 participants