Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for large string columns to Parquet reader and writer #15632
Add support for large string columns to Parquet reader and writer #15632
Changes from 13 commits
072f7a7
1e44820
6c6c6fd
de26d50
b67d9d3
70ee8c4
2c45fdf
7035112
aea2691
3b77620
32452d4
8748094
5b0156d
5587bff
a9a2230
d465fab
3a2e64e
16b94b8
92c062c
8f8e3ae
182ce50
9d4dcbe
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these align declarators needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tbh, I started using the
__align__
as a monkey-see-monkey-do kind of thing 😅. I don't know if they're actually necessary at this point.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect this one is the only one that may have purpose
https://github.com/rapidsai/cudf/pull/15632/files#diff-52e09ddca44181e11af56d8526360207906f5f25ba888cf51efbd2c1b15d775cR957
Only because
page_state_s
is a structure but it should probably have been declared withalignas(16)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll remove them since they're unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that the input column could have been sliced?
If so, then this would be more correct.
Note that if the column has not been sliced then
column.offset()==0
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, IIRC it was originally written that way due to a concern @vuule had about sliced columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't
get_offset_value
already adjust for the column offset? AFAICT, it's usingget_value
, which usesdata()
, which is implemented ashead<T>() + _offset
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. The
get_offset_value()
takes an offsets column which does not include it's parent's sliced values offset/size.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thank you for clearing that up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
candidate for inclusion in
StringsLargeTest
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good idea.
I think I'd want it behave like the
CUDF_TEST_ENABLE_LARGE_STRINGS()
macro where it automatically unsets the environment variable at the end of the scope. I can do this in a follow on PR so to keep this one more focused.