-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding MAP type support for ORC Reader #9132
Adding MAP type support for ORC Reader #9132
Conversation
…orc_struct_null_issue
Returning info on which columns are maps might be a part of the requirements (means a switch to |
Codecov Report
@@ Coverage Diff @@
## branch-21.10 #9132 +/- ##
===============================================
Coverage ? 10.77%
===============================================
Files ? 115
Lines ? 19138
Branches ? 0
===============================================
Hits ? 2062
Misses ? 17076
Partials ? 0 Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just got a few nitpicks.
"columns", | ||
[None, ["lvl1_map", "lvl2_struct_map"], ["lvl2_struct_map", "lvl2_map"]], | ||
) | ||
@pytest.mark.parametrize("num_rows", [0, 15, 1005, 10561, 100000]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100000 is enough to have multiple stripes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it has 98 stripes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope you mean rowgroups, 98 stripes make a ~25GB file :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, it's writing with tiny stripes, now I see :D
carry on, then :)
…me and other review changes
case orc::DECIMAL: | ||
if (type == type_id::DECIMAL64) { | ||
scale = -static_cast<int32_t>(_metadata->get_types()[orc_col_id].scale.value_or(0)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to throw an exception if type == type_id::DECIMAL32
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of orc:Decimal, there are chances where they expect us to return Double. And orc supports only Decimal64 and Decimal128 if I am not wrong.
|
||
schema = po.Struct(**schema) | ||
|
||
lvl1_map = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps you'd like to parametrize on lvl_map types. This right now has code inside and outside the test param gen that depend on each other. Instead you could have tuples of orc file buffers and expected values as separate params.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am trying to avoid generating the buffer multiple times since we are looking for 100000 rows, just to execution time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. A couple very minor suggestions, but nothing worth holding up the PR over. Feel free to resolve them however you'd like and merge.
@@ -176,7 +176,8 @@ class reader::impl { | |||
*/ | |||
column_buffer&& assemble_buffer(const int32_t orc_col_id, | |||
std::vector<std::vector<column_buffer>>& col_buffers, | |||
const size_t level); | |||
const size_t level, | |||
rmm::cuda_stream_view stream); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this have a default, or will we always be calling it internally with a stream?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is only for internal use only.
Co-authored-by: Vyas Ramasubramani <[email protected]>
rerun tests |
@gpucibot merge |
Since cuDF still doesn't support MAP type, this will be viewed as list of structs, and struct having key and value pair as members.
#8826 Completes Reader part of MAP support.