Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert Column Name to String Before Using Struct Column Factory #10156

Merged

Conversation

isVoid
Copy link
Contributor

@isVoid isVoid commented Jan 28, 2022

Closes #10155

build_struct_column requires that the field names to be strings. But dataframe column names can be any hashable types. Passing in column names as field names in to_struct is thus unsafe. This PR adds a check and raise a warning if the cast to string is required to take place.

@isVoid isVoid requested a review from a team as a code owner January 28, 2022 02:52
@github-actions github-actions bot added the Python Affects Python cuDF API. label Jan 28, 2022
@codecov
Copy link

codecov bot commented Jan 28, 2022

Codecov Report

❗ No coverage uploaded for pull request base (branch-22.04@f1e0bb6). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff               @@
##             branch-22.04   #10156   +/-   ##
===============================================
  Coverage                ?   10.47%           
===============================================
  Files                   ?      122           
  Lines                   ?    20505           
  Branches                ?        0           
===============================================
  Hits                    ?     2147           
  Misses                  ?    18358           
  Partials                ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f1e0bb6...0bc0ca3. Read the comment docs.

galipremsagar
galipremsagar previously approved these changes Jan 28, 2022
@shwina
Copy link
Contributor

shwina commented Jan 28, 2022

Maybe we should add a user warning that an implicit conversion is happening here?

@isVoid
Copy link
Contributor Author

isVoid commented Jan 29, 2022

Offline with @shwina , what we decided here is not to implicitly cast the column names to strings. Instead, every user who calls build_struct_column factory should be aware that struct column field names can only be strings and it's caller's responsibility to cast the names to string before using this function. I will make that explicit in the docstring and update to_struct with that.

@isVoid isVoid added bug Something isn't working 3 - Ready for Review Ready for review by team non-breaking Non-breaking change labels Jan 29, 2022
@isVoid isVoid requested a review from galipremsagar January 29, 2022 00:41
@isVoid isVoid dismissed galipremsagar’s stale review January 29, 2022 00:42

Code changed significantly since last review.

@isVoid isVoid changed the title Convert Column Name to String Before Constructing StructDtype Convert Column Name to String Before Using Struct Column Factory Jan 29, 2022
python/cudf/cudf/core/dataframe.py Outdated Show resolved Hide resolved
python/cudf/cudf/tests/test_struct.py Show resolved Hide resolved
Copy link
Contributor

@skirui-source skirui-source left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than Bradley's suggestions LGTM!

@isVoid isVoid requested a review from bdice February 5, 2022 02:16
@isVoid isVoid added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team labels Feb 10, 2022
@isVoid
Copy link
Contributor Author

isVoid commented Feb 14, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit a443dd1 into rapidsai:branch-22.04 Feb 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge bug Something isn't working non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] to_struct Fails if DataFrame Column Name is not String
5 participants