Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Preserve the correct ListDtype while creating an identical empty column #10151

Merged
merged 3 commits into from
Jan 31, 2022

Conversation

galipremsagar
Copy link
Contributor

Fixes: #10122

This PR fixes an issue where the list columns children[1]'s dtype wasn't being preserved correctly.

@galipremsagar galipremsagar added bug Something isn't working 3 - Ready for Review Ready for review by team Python Affects Python cuDF API. 4 - Needs cuDF (Python) Reviewer non-breaking Non-breaking change labels Jan 27, 2022
@galipremsagar galipremsagar self-assigned this Jan 27, 2022
@galipremsagar galipremsagar requested a review from a team as a code owner January 27, 2022 19:41
@codecov
Copy link

codecov bot commented Jan 27, 2022

Codecov Report

Merging #10151 (10bc21d) into branch-22.04 (e24fa8f) will increase coverage by 0.05%.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff                @@
##           branch-22.04   #10151      +/-   ##
================================================
+ Coverage         10.37%   10.42%   +0.05%     
================================================
  Files               119      119              
  Lines             20149    20597     +448     
================================================
+ Hits               2091     2148      +57     
- Misses            18058    18449     +391     
Impacted Files Coverage Δ
python/cudf/cudf/errors.py 0.00% <0.00%> (ø)
python/cudf/cudf/io/csv.py 0.00% <0.00%> (ø)
python/cudf/cudf/io/hdf.py 0.00% <0.00%> (ø)
python/cudf/cudf/io/orc.py 0.00% <0.00%> (ø)
python/cudf/cudf/__init__.py 0.00% <0.00%> (ø)
python/cudf/cudf/_version.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/abc.py 0.00% <0.00%> (ø)
python/cudf/cudf/api/types.py 0.00% <0.00%> (ø)
python/cudf/cudf/io/dlpack.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/frame.py 0.00% <0.00%> (ø)
... and 60 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 57ac8c4...10bc21d. Read the comment docs.

@galipremsagar galipremsagar added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team 4 - Needs cuDF (Python) Reviewer labels Jan 31, 2022
@galipremsagar
Copy link
Contributor Author

@gpucibot merge

@rapids-bot rapids-bot bot merged commit c25d35b into rapidsai:branch-22.04 Jan 31, 2022
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized while submitting this review that I was a little late. Sorry @galipremsagar - I opened the tab earlier when you asked for review but was on a meeting until now. These suggestions might be helpful but are not blocking.

{"a": cudf.Series([["a", "b"], []]), "b": [1, 2], "c": [123, 321]}
)
df1 = df[df.b.isna()]
df1["b"] = df1["c"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a comment to explain the logic of the test, and particularly why this line is necessary. On the surface, this line appears to do nothing (it assigns an empty Series "b" to have the data of another empty Series "c", but both are int64 types). In the failing case, this triggers a change in the dtype of df1["a"] because it is empty and goes from ListDtype(object) (list of strings) to ListDtype(int8). That side-effect is not clear because this line doesn't reference "a" at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

df1 = df[df.b.isna()]
df1["b"] = df1["c"]
df2 = df1.drop(["c"], axis=1)
assert df2.a.dtype == df.a.dtype
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to use the same square-bracket indexing here as above, rather than switch to attribute-based access:

Suggested change
assert df2.a.dtype == df.a.dtype
assert df2["a"].dtype == df["a"].dtype

rapids-bot bot pushed a commit that referenced this pull request Jan 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge bug Something isn't working non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] DType of Elements in Lists changes
3 participants