Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix __setitem__ on string columns when the scalar value ends in a null byte #12991

Merged
merged 3 commits into from
Mar 23, 2023

Conversation

wence-
Copy link
Contributor

@wence- wence- commented Mar 22, 2023

Description

Since numpy strings are fixed width and use a null byte as an
indicator of the end of the string, there is no way to distinguish
between numpy.str_("abc\x00").item() and numpy.str_("abc").item().
This has consequences for scalar preprocessing we do when constructing
a cudf.Scalar, since that usually goes through
numpy.astype(...).item(). So, when preprocessing as scalar, if we
notice it is a string with trailing null bytes, keep it as is.

Closes #12990.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

wence- added 2 commits March 22, 2023 12:53
Since numpy strings are fixed width and use a null byte as an
indicator of the end of the string, there is no way to distinguish
between numpy.str_("abc\x00").item() and numpy.str_("abc").item().
This has consequences for scalar preprocessing we do when constructing
a cudf.Scalar, since that usually goes through
numpy.astype(...).item(). So, when preprocessing as scalar, if we
notice it is a string with trailing null bytes, keep it as is.

Closes rapidsai#12990.
@wence- wence- requested a review from a team as a code owner March 22, 2023 15:03
@github-actions github-actions bot added the Python Affects Python cuDF API. label Mar 22, 2023
@wence- wence- self-assigned this Mar 22, 2023
@wence- wence- added bug Something isn't working non-breaking Non-breaking change labels Mar 22, 2023
@wence- wence- changed the title Fix __setitem__ on string columns when the scalar value is a null byte Fix __setitem__ on string columns when the scalar value ends in a null byte Mar 22, 2023
@galipremsagar galipremsagar added the 5 - Ready to Merge Testing and reviews complete, ready to merge label Mar 23, 2023
@galipremsagar
Copy link
Contributor

/merge

@rapids-bot rapids-bot bot merged commit 3a2609b into rapidsai:branch-23.04 Mar 23, 2023
@wence- wence- deleted the wence/fix/12990 branch March 23, 2023 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge bug Something isn't working non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] __setitem__ on string column with \x00 scalar loses value
2 participants