Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Initial support for struct columns #5807

Merged
merged 29 commits into from
Aug 13, 2020

Conversation

mythrocks
Copy link
Contributor

Closes #5700.

This PR adds rudimentary support for struct columns in CUDF, the behaviour of which is described here:
#5700

@mythrocks mythrocks requested review from a team as code owners July 30, 2020 06:18
@mythrocks mythrocks self-assigned this Jul 30, 2020
@GPUtester
Copy link
Collaborator

Please update the changelog in order to start CI tests.

View the gpuCI docs here.

@mythrocks mythrocks added the feature request New feature or request label Jul 30, 2020
cpp/src/structs/structs_column_factories.cu Outdated Show resolved Hide resolved
cpp/src/structs/structs_column_factories.cu Outdated Show resolved Hide resolved
@mythrocks mythrocks changed the title [WIP] Tentative support for struct columns [WIP] Initial support for struct columns Jul 31, 2020
@mythrocks mythrocks changed the title [WIP] Initial support for struct columns Initial support for struct columns Aug 6, 2020
@mythrocks mythrocks requested a review from a team as a code owner August 6, 2020 18:57
cpp/include/cudf/structs/struct_view.hpp Outdated Show resolved Hide resolved
cpp/include/cudf/column/column_view.hpp Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Aug 6, 2020

Codecov Report

Merging #5807 into branch-0.15 will decrease coverage by 4.07%.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.15    #5807      +/-   ##
===============================================
- Coverage        88.65%   84.57%   -4.08%     
===============================================
  Files               57       81      +24     
  Lines            10945    13852    +2907     
===============================================
+ Hits              9703    11715    +2012     
- Misses            1242     2137     +895     
Impacted Files Coverage Δ
python/cudf/cudf/core/dtypes.py 86.84% <0.00%> (-10.09%) ⬇️
python/cudf/cudf/core/abc.py 91.48% <0.00%> (-2.96%) ⬇️
python/cudf/cudf/core/frame.py 89.76% <0.00%> (-1.42%) ⬇️
python/cudf/cudf/utils/ioutils.py 86.13% <0.00%> (-1.21%) ⬇️
python/dask_cudf/dask_cudf/sorting.py 93.12% <0.00%> (-0.99%) ⬇️
python/cudf/cudf/core/indexing.py 95.22% <0.00%> (-0.68%) ⬇️
python/cudf/cudf/utils/dtypes.py 86.03% <0.00%> (-0.40%) ⬇️
python/cudf/cudf/core/column/categorical.py 93.61% <0.00%> (-0.10%) ⬇️
python/cudf/cudf/core/groupby/groupby.py 93.18% <0.00%> (-0.10%) ⬇️
python/cudf/cudf/core/column/column.py 87.36% <0.00%> (-0.01%) ⬇️
... and 62 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5c57bd5...1c0ab90. Read the comment docs.

1. Added documentation for struct_column_wrapper.
2. Removed unnecessary std::move for child null masks.
@mythrocks
Copy link
Contributor Author

mythrocks commented Aug 10, 2020

Super! Thank you for reviewing, @raydouglass, @ajschmidt8, @jrhemstad.

Might this PR please be merged?
I'll request for another C++ review.

@mythrocks mythrocks requested review from nvdbaranec and removed request for trevorsm7 and karthikeyann August 11, 2020 15:30
@mythrocks
Copy link
Contributor Author

Thanks for picking up the review, @nvdbaranec. :]

Copy link
Contributor

@nvdbaranec nvdbaranec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to have more tests involving empty columns, particularly in the nested list/struct case. Also some tests that bounce between list<struct<list<struct< etc (with validity and nulls, etc).

cpp/src/search/search.cu Outdated Show resolved Hide resolved
cpp/src/scalar/scalar_factories.cpp Outdated Show resolved Hide resolved
cpp/src/structs/structs_column_factories.cu Outdated Show resolved Hide resolved
cpp/src/structs/structs_column_view.cu Outdated Show resolved Hide resolved
cpp/tests/structs/structs_column_tests.cu Show resolved Hide resolved
cudf::test::expect_columns_equivalent(expected_unchanged_struct_col,
cudf::lists_column_view(*list_col).child());

#ifndef NDEBUG
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jrhemstad Is there a cudf standard for debug stuff like this? This is generally something I like to do but have avoided so far because it doesn't seem to be used in cudf at large.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took my cues from rolling_test, for the printing. (Although there, it was under a #if 0.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering that the cudf debug build doesn't even succeed currently (if you got it to build, I'd love to hear about it), I'm guessing you didn't compile this code?

Copy link
Contributor Author

@mythrocks mythrocks Aug 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing you didn't compile this code?

Not in debug mode, but I do use the prints. I comment out the #ifndef NDEBUG, when I'm debugging. :/

I left this in for when I figure out debugging, proper.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to remove these prints, if that's preferable.

@mythrocks
Copy link
Contributor Author

Also some tests that bounce between list<struct<list<struct< etc (with validity and nulls, etc).

The reason I don't have deeper nested tests is:

  1. It is wordy. Both the composition of the test columns and the result checks are very verbose at the moment. I'm afraid I don't have support for literals yet.
  2. Checking results for list<struct> and struct<list> has proven difficult because of the equality/equivalence issue listed in [QST] expect_columns_equal() vs expect_columns_equivalent() #5867. :/

1. Added tests for deeper nesting.
2. Fixed license headers.
3. Added convenience methods to structs_column_wrapper.
@mythrocks
Copy link
Contributor Author

The reason I don't have deeper nested tests is:

I have added tests for list<struct<list<numeric>>> and struct<list<struct<numeric>>>.

@nvdbaranec nvdbaranec self-requested a review August 12, 2020 23:50
@mythrocks mythrocks merged commit aa90152 into rapidsai:branch-0.15 Aug 13, 2020
@mythrocks
Copy link
Contributor Author

Super. Thanks for reviewing, @nvdbaranec.
I'll get to work on the gather() piece now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team feature request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Structs in CUDF
7 participants