[REVIEW] Preserve the correct `ListDtype` while creating an identical empty column #10151

galipremsagar · 2022-01-27T19:41:51Z

This PR fixes an issue where the list columns children[1]'s dtype wasn't being preserved correctly.

codecov · 2022-01-27T21:25:37Z

Codecov Report

Merging #10151 (10bc21d) into branch-22.04 (e24fa8f) will increase coverage by 0.05%.
The diff coverage is n/a.

@@               Coverage Diff                @@
##           branch-22.04   #10151      +/-   ##
================================================
+ Coverage         10.37%   10.42%   +0.05%     
================================================
  Files               119      119              
  Lines             20149    20597     +448     
================================================
+ Hits               2091     2148      +57     
- Misses            18058    18449     +391

Impacted Files	Coverage Δ
python/cudf/cudf/errors.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/io/csv.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/io/hdf.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/io/orc.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/__init__.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/_version.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/core/abc.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/api/types.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/io/dlpack.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/core/frame.py	`0.00% <0.00%> (ø)`
... and 60 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 57ac8c4...10bc21d. Read the comment docs.

galipremsagar · 2022-01-31T17:35:03Z

@gpucibot merge

bdice

Just realized while submitting this review that I was a little late. Sorry @galipremsagar - I opened the tab earlier when you asked for review but was on a meeting until now. These suggestions might be helpful but are not blocking.

bdice · 2022-01-31T18:16:49Z

python/cudf/cudf/tests/test_list.py

+        {"a": cudf.Series([["a", "b"], []]), "b": [1, 2], "c": [123, 321]}
+    )
+    df1 = df[df.b.isna()]
+    df1["b"] = df1["c"]


This needs a comment to explain the logic of the test, and particularly why this line is necessary. On the surface, this line appears to do nothing (it assigns an empty Series "b" to have the data of another empty Series "c", but both are int64 types). In the failing case, this triggers a change in the dtype of df1["a"] because it is empty and goes from ListDtype(object) (list of strings) to ListDtype(int8). That side-effect is not clear because this line doesn't reference "a" at all.

Addressed these comments in https://github.com/rapidsai/cudf/pull/10176/files

bdice · 2022-01-31T18:20:00Z

python/cudf/cudf/tests/test_list.py

+    df1 = df[df.b.isna()]
+    df1["b"] = df1["c"]
+    df2 = df1.drop(["c"], axis=1)
+    assert df2.a.dtype == df.a.dtype


I would prefer to use the same square-bracket indexing here as above, rather than switch to attribute-based access:

Suggested change

assert df2.a.dtype == df.a.dtype

assert df2["a"].dtype == df["a"].dtype

PR to address #10151 (review) Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Bradley Dice (https://github.com/bdice) URL: #10176

galipremsagar added 3 commits January 27, 2022 11:30

handle list dtype in column_empty

02f87a4

add test

78d763a

copyright

10bc21d

galipremsagar added bug Something isn't working 3 - Ready for Review Ready for review by team Python Affects Python cuDF API. 4 - Needs cuDF (Python) Reviewer non-breaking Non-breaking change labels Jan 27, 2022

galipremsagar self-assigned this Jan 27, 2022

galipremsagar requested a review from a team as a code owner January 27, 2022 19:41

galipremsagar requested review from rgsl888prabhu and brandon-b-miller January 27, 2022 19:41

brandon-b-miller approved these changes Jan 31, 2022

View reviewed changes

galipremsagar added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team 4 - Needs cuDF (Python) Reviewer labels Jan 31, 2022

rapids-bot bot merged commit c25d35b into rapidsai:branch-22.04 Jan 31, 2022

bdice reviewed Jan 31, 2022

View reviewed changes

galipremsagar mentioned this pull request Jan 31, 2022

[REVIEW] Add comments to explain test validation #10176

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REVIEW] Preserve the correct `ListDtype` while creating an identical empty column #10151

[REVIEW] Preserve the correct `ListDtype` while creating an identical empty column #10151

galipremsagar commented Jan 27, 2022

codecov bot commented Jan 27, 2022 •

edited

Loading

galipremsagar commented Jan 31, 2022

bdice left a comment

bdice Jan 31, 2022

galipremsagar Jan 31, 2022

bdice Jan 31, 2022

	assert df2.a.dtype == df.a.dtype
	assert df2["a"].dtype == df["a"].dtype

[REVIEW] Preserve the correct ListDtype while creating an identical empty column #10151

[REVIEW] Preserve the correct ListDtype while creating an identical empty column #10151

Conversation

galipremsagar commented Jan 27, 2022

codecov bot commented Jan 27, 2022 • edited Loading

Codecov Report

galipremsagar commented Jan 31, 2022

bdice left a comment

Choose a reason for hiding this comment

bdice Jan 31, 2022

Choose a reason for hiding this comment

galipremsagar Jan 31, 2022

Choose a reason for hiding this comment

bdice Jan 31, 2022

Choose a reason for hiding this comment

[REVIEW] Preserve the correct `ListDtype` while creating an identical empty column #10151

[REVIEW] Preserve the correct `ListDtype` while creating an identical empty column #10151

codecov bot commented Jan 27, 2022 •

edited

Loading