Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconstruct dtypes correctly for list aggs of struct columns #12290

Merged
merged 8 commits into from
Jan 23, 2023

Conversation

wence-
Copy link
Contributor

@wence- wence- commented Dec 2, 2022

Description

As usual when returning from libcudf, we need to reconstruct a struct
dtype with appropriate labels. For groupby.agg(list) this can be done
by matching on the element_type of the result column and
reconstructing with a new list dtype with a leaf from the original
column.

Closes #11765
Closes #11907

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@wence- wence- added bug Something isn't working 3 - Ready for Review Ready for review by team Python Affects Python cuDF API. non-breaking Non-breaking change labels Dec 2, 2022
@wence- wence- requested a review from a team as a code owner December 2, 2022 17:06
@wence- wence- requested review from bdice and charlesbluca December 2, 2022 17:06
@wence- wence- force-pushed the wence/fix/issue-11765 branch from da801f5 to 4edf071 Compare December 5, 2022 11:31
@codecov
Copy link

codecov bot commented Dec 5, 2022

Codecov Report

Base: 86.58% // Head: 85.68% // Decreases project coverage by -0.90% ⚠️

Coverage data is based on head (0111854) compared to base (b6dccb3).
Patch has no changes to coverable lines.

Additional details and impacted files
@@               Coverage Diff                @@
##           branch-23.02   #12290      +/-   ##
================================================
- Coverage         86.58%   85.68%   -0.90%     
================================================
  Files               155      155              
  Lines             24368    24868     +500     
================================================
+ Hits              21098    21309     +211     
- Misses             3270     3559     +289     
Impacted Files Coverage Δ
python/cudf/cudf/_version.py 1.41% <0.00%> (-98.59%) ⬇️
python/cudf/cudf/core/buffer/spill_manager.py 72.50% <0.00%> (-7.50%) ⬇️
python/cudf/cudf/core/buffer/spillable_buffer.py 91.07% <0.00%> (-1.78%) ⬇️
python/cudf/cudf/utils/dtypes.py 77.85% <0.00%> (-1.61%) ⬇️
python/cudf/cudf/options.py 86.11% <0.00%> (-1.59%) ⬇️
python/cudf/cudf/core/single_column_frame.py 94.30% <0.00%> (-1.27%) ⬇️
...ython/custreamz/custreamz/tests/test_dataframes.py 98.38% <0.00%> (-1.01%) ⬇️
python/dask_cudf/dask_cudf/io/csv.py 96.34% <0.00%> (-1.00%) ⬇️
python/dask_cudf/dask_cudf/io/parquet.py 91.81% <0.00%> (-0.59%) ⬇️
python/cudf/cudf/core/multiindex.py 91.66% <0.00%> (-0.51%) ⬇️
... and 43 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@wence- wence- force-pushed the wence/fix/issue-11765 branch from 4edf071 to fe6fdbc Compare January 17, 2023 10:52
As usual when returning from libcudf, we need to reconstruct a struct
dtype with appropriate labels. For groupby.agg(list) this can be done
by matching on the element_type of the result column and
reconstructing with a new list dtype with a leaf from the original
column.

Closes rapidsai#11765.
@wence- wence- force-pushed the wence/fix/issue-11765 branch from fe6fdbc to 7fa7ee1 Compare January 17, 2023 10:59
We can't transfer the categorical dtype into an element of the output
list dtype right now, so the reconstruction with appropriate dtype
metadata is not possible. To avoid confusion, just remove support.
@wence-
Copy link
Contributor Author

wence- commented Jan 18, 2023

One cibuildwheel run failed (I guess due to network timeouts, it looks like). Is there a way to restart this, or do I just push a(nother) merge commit and cross my fingers?

@vyasr
Copy link
Contributor

vyasr commented Jan 20, 2023

One cibuildwheel run failed (I guess due to network timeouts, it looks like). Is there a way to restart this, or do I just push a(nother) merge commit and cross my fingers?

If you click on the "Details" link next to a failed check it will take you into the relevant "Actions" section. There you'll see a "Re-run jobs" in the top right, which you can use to only rerun failed tests.

@vyasr
Copy link
Contributor

vyasr commented Jan 20, 2023

FWIW the latest failure appears to be the same one that @madsbk observed in #12554 (comment) and merging again seems to have fixed it (I didn't look into the underlying cause upstream, perhaps something related to the new recent pandas version causing the xfailed test to succeed, or maybe some transient inconsistency with some upstream being pulled? It's very weird that it's an xfail-related failure though, so not sure without digging further.).

@wence- wence- self-assigned this Jan 20, 2023
@wence- wence- requested review from bdice and shwina January 20, 2023 10:33
@wence-
Copy link
Contributor Author

wence- commented Jan 23, 2023

/merge

@rapids-bot rapids-bot bot merged commit 24efb9c into rapidsai:branch-23.02 Jan 23, 2023
@wence- wence- deleted the wence/fix/issue-11765 branch January 23, 2023 19:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team bug Something isn't working non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Support "COLLECT" aggregation on struct columns in cuDF [QST] list-aggregation for struct columns
4 participants