Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Report number of rows per file read by PQ reader when no row selection and fix segfault in chunked PQ reader when skip_rows > 0 #16195
Report number of rows per file read by PQ reader when no row selection and fix segfault in chunked PQ reader when skip_rows > 0 #16195
Changes from 30 commits
c249f05
3891e3a
4bc569e
d3863a6
794d59c
ebdfad5
0fd6890
a294c18
d268873
702b0ee
d031af9
1a207e9
78ed6d1
7ad6179
975b7c3
e826caf
fa33f7a
0ac70b2
bcc3bec
363b0da
8c5816b
e239382
6f7d203
c19e972
9208a80
513d3bb
c11b27d
dd19f52
a641037
6ae6f07
91e4735
ed4352a
02b32ac
71e2d4d
1439189
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Binary search the lower and upper index into the
partial_sum_nrows_source
and compute the number of rows seen per source in between.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All diff in this file only fixes the arithmetic that leads to segfault for
skip_rows
. Changes needed for gtests to pass successfullyThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how come we don't need to account for skip_rows here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really good question: This is because
_file_itm_data.input_pass_start_row_count
is computed from the selected row groups (in reader_impl_helpers.cpp) so the offsets are adjusted forglobal_skip_rows
(meaning offset 0 = global_skip_rows). Thus we re-addglobal_skip_rows
topass.start_row = start_row + global_skip_rows
but adding it to end_row is redundant since we only need to compute thepass.num_rows = end_row + global_skip_rows - start_row - global_skip_rows
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope this makes sense lol. Here's an example. Say
global_skip_rows = 120
and we have_file_itm_data.input_pass_start_row_count = [0, 80, 180, 280, 380]
, then we should compute passes as:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The loop will now stop as soon as count >= rows_to_read + rows_to_skip