-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding optional parquet reader schema #11524
Adding optional parquet reader schema #11524
Conversation
Why does a reader change fix #11506 that complains about the writer not working? Even if this does fix it too. Shouldn't they be separate PRs? |
Codecov Report
@@ Coverage Diff @@
## branch-22.10 #11524 +/- ##
===============================================
Coverage ? 86.36%
===============================================
Files ? 145
Lines ? 22949
Branches ? 0
===============================================
Hits ? 19820
Misses ? 3129
Partials ? 0 Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor nitpicks. A couple of minor changes.
I suspect the JNI side might be missing a change, though. (I should sync with @hyperbolic2346 to confirm.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
As expected, manually creating a schema is difficult, but the current API makes it manageable. I have no reasonable ideas on how to improve it further.
Thank you for taking the time to clean up the test code! 🔥
auto expected = std::make_unique<table>(std::move(cols)); | ||
EXPECT_EQ(1, expected->num_columns()); | ||
auto expected = table_view{{col}}; | ||
EXPECT_EQ(1, expected.num_columns()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really a part of the review, just taking the opportunity to mention this. This is a weird check, right? IMO it's testing table
(previously)/table_view
(now) constructor, rather then anything in Parquet. I would be happier without these asserts. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought they were generally in here to prevent someone coming and adding a second column and not updating the rest of the test. It is an odd thing to see though, because why would you simply add a column without even looking at the rest of the test? I'd be happy to pull them out in this PR if you desire. Not much work and not too much point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably better to do it in a different PR, this one looks good to go:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approve but can't verify if there will be compiler issue with missing <optional>
header. But I believe that we can fix it quickly if there is any.
@gpucibot merge |
As noticed in review of #11524 there are unnecessary asserts in the parquet tests. This removes those. closes #11541 Authors: - Mike Wilson (https://github.com/hyperbolic2346) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - Nghia Truong (https://github.com/ttnghia) URL: #11544
Just verified in local build: no compiler issue 👍 |
Description
Adding a schema for reading parquet files. This is useful for things like binary data reading where the default behavior of cudf is to read it as a string column, but users wish to read it as a list column instead. Using a schema allows for nested data types to be expressed completely.
Checklist