-
Notifications
You must be signed in to change notification settings - Fork 919
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding support for list and struct type in ORC Reader #8599
Merged
rapids-bot
merged 77 commits into
rapidsai:branch-21.08
from
rgsl888prabhu:orc_list_struct_support
Jul 6, 2021
Merged
Changes from 75 commits
Commits
Show all changes
77 commits
Select commit
Hold shift + click to select a range
7c0dcb5
python portion of supporting multiple ORC input files
jdye64 739c32d
removed commented out python code
jdye64 b7818c9
change verb to plural form to make intention more clear
jdye64 c57acf5
daily update checkpoint
jdye64 1294ee0
updated to use multiple stripes. Use stripes as input drivers
jdye64 3da9b29
Checkpoint; codebase compiles, except for tests and test runs
jdye64 39c13e3
updated cpp test to ensure vector<vector<>> is passed to options builder
jdye64 374e9e6
updates where row counts match now
jdye64 6197eb2
gather_column_info probing
jdye64 8d09d18
update orc column mapping to include hidden struct col
jdye64 3c2ab1d
merge conflicts
jdye64 124b7ec
compilation syntax error
jdye64 76a55b2
inittial chnages
rgsl888prabhu a259433
fixing style
rgsl888prabhu 38aa4d8
Support for users specify more than a single ORC file to read to cudf…
jdye64 17ba956
Introduce insertion indexes for situations where the same stripe migh…
jdye64 b2eb976
changes
rgsl888prabhu cc66a23
updates which prevent illegal device memory access when a user specif…
jdye64 906a246
preservation commit before upstream merge in case of rollback needed
jdye64 3419416
Merge remote-tracking branch 'upstream/branch-21.08' into orc_list_files
jdye64 a4eddc3
changes
rgsl888prabhu e4f19f1
changes
rgsl888prabhu b7926fe
Several tests passing now, but still a few that do not, checkpoint co…
jdye64 aee5f76
list multiple rougroup breaking
rgsl888prabhu fce67b1
Merge remote-tracking branch 'upstream/branch-21.08' into orc_list_files
jdye64 599730a
updates for all test but decimal32 working
jdye64 7a64f5f
uncommented all tests
jdye64 422757b
add new line to end of file that vscode had removed
jdye64 2c8a629
nested list works
rgsl888prabhu ac116ba
nesting and num_rows works properly
rgsl888prabhu a7a2124
Merge remote-tracking branch 'upstream/branch-21.08' into orc_list_files
jdye64 da2d898
nested table works
rgsl888prabhu d3c1e24
handling empty rows
rgsl888prabhu 828c3c5
Merge remote-tracking branch 'upstream/branch-21.08' into orc_list_files
jdye64 b0bb41c
Merge branch 'orc_list_files' of https://github.com/jdye64/cudf into …
rgsl888prabhu a358293
Merge branch 'branch-21.08' of https://github.com/rapidsai/cudf into …
rgsl888prabhu 14b6046
changes before merge
jdye64 8847efe
upstream merge
jdye64 0aff037
cleaning
rgsl888prabhu 095c959
changes
rgsl888prabhu 9d09350
Merge branch 'branch-21.08' of https://github.com/rapidsai/cudf into …
rgsl888prabhu a9126e0
re-enable orc tests
jdye64 ce6bf65
Updates for filter files and stripes when multiple input sources are …
jdye64 08640b7
all the previous tests pass
rgsl888prabhu 29fbc28
Merge branch 'orc_list_files' of https://github.com/jdye64/cudf into …
rgsl888prabhu 99f830f
Merge remote-tracking branch 'upstream/branch-21.08' into orc_list_files
jdye64 5b159fd
Fix byteRle issue with null mask
rgsl888prabhu f9651f4
adding test cases
rgsl888prabhu 32a0f3a
Modified read_orc to accept a single stripes list and expand it
jdye64 58dfdd0
cleaning
rgsl888prabhu 3da6663
partial review fix
jdye64 dc41fdc
address remaining review comments
jdye64 9df73d5
Merge remote-tracking branch 'upstream/branch-21.08' into orc_list_files
jdye64 f27a930
update to thrust::pair from std::pair
jdye64 4984cc6
remove unneeded stripe_idx_in_source variable
jdye64 45c538e
review updates
jdye64 086b043
use assert_eq
jdye64 af94db0
compare gdf to pdf
jdye64 5295c26
update multiple input files python test to include num_rows as well
jdye64 39cbe71
Merge remote-tracking branch 'upstream/branch-21.08' into orc_list_files
jdye64 6f39bad
fix documentation typo
jdye64 ddee11c
cleaning
rgsl888prabhu 6f204ce
test to remove fixture and create dataset with 2 stripes
rgsl888prabhu 3e4575c
Merge remote-tracking branch 'upstream/branch-21.08' into orc_list_files
jdye64 7206a74
Reenabled example Cmake CUDA due to cmake 3.22 bug
jdye64 89300ff
cleaning
rgsl888prabhu e7e3b86
Merge branch 'orc_list_files' of https://github.com/jdye64/cudf into …
rgsl888prabhu 35f3708
Merge branch 'branch-21.08' of https://github.com/rapidsai/cudf into …
rgsl888prabhu 82bffe7
fix failing test and cleaning
rgsl888prabhu f46fdb4
Merge branch 'branch-21.08' of https://github.com/rapidsai/cudf into …
rgsl888prabhu 19fbaa5
add exclusive scan and cleaning
rgsl888prabhu d3e9198
row group fix
rgsl888prabhu 61a6eb0
addressing review changes
rgsl888prabhu e5920c2
review changes to add 2d span
rgsl888prabhu 87f4ce8
Merge branch 'branch-21.08' of https://github.com/rapidsai/cudf into …
rgsl888prabhu 4eb0af4
review changes
rgsl888prabhu b1d9483
review changes
rgsl888prabhu File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How come we don't need to check if these are
static_cast<uint32_t>(-1)
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The if condition in the next line will take care of that scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a big deal, but the logic here is implicit/fragile. The invalid value (
-1
) is only covered in the next line because of unsigned integer underflow.IMO there should be a validity check for
schema_idxs[col_id].field
(and maybeschema_idxs[col_id].parent
, not sure) before we compare againstfieldNames.size()
and potentially set the column name.