-
Notifications
You must be signed in to change notification settings - Fork 850
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extract method to drive PageIterator -> RecordReader #1031
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1031 +/- ##
==========================================
- Coverage 82.31% 82.30% -0.01%
==========================================
Files 168 168
Lines 49031 49022 -9
==========================================
- Hits 40360 40350 -10
- Misses 8671 8672 +1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If for example the first page of the first row group contained a compression codec that was not supported, this would currently error in the constructor, whereas with this change it will error in ArrayReader::next_batch. I personally think this makes more sense. FWIW I think this is the only possible error case and so it is unlikely user-code would be impacted...
I agree this seems like a reasonable change
This change looks good to me.
cc @sunchao
Co-authored-by: Andrew Lamb <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (pending CI)
if records_read_once < records_to_read { | ||
if let Some(page_reader) = pages.next() { | ||
// Read from new page reader (i.e. column chunk) | ||
record_reader.set_page_reader(page_reader?)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated, but I'm thinking whether we can reset the record_reader
upon new column chunk, so that we don't have to keep accumulating buffers for def & rep levels, and values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we just called reset here, we would lose data. But we definitely could delimit record batches, i.e. don't buffer data across column chunk boundaries. This would be breaking change though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks!
Merged, thanks! |
Co-authored-by: Andrew Lamb <[email protected]>
Co-authored-by: Andrew Lamb <[email protected]> Co-authored-by: Raphael Taylor-Davies <[email protected]>
Which issue does this PR close?
Relates to #171
Rationale for this change
The logic for driving a
RecordReader
from aPageIterator
is currently duplicated inPrimitiveArrayReader
andNullArrayReader
. This duplication will only increase with newArrayReader
implementations added as part of #171What changes are included in this PR?
Extracts a read_records function to handle this logic
Are there any user-facing changes?
This no longer eagerly populates the
RecordReader
with the firstPageReader
from thePageIterator
. If for example the first page of the first row group contained a compression codec that was not supported, this would currently error in the constructor, whereas with this change it will error inArrayReader::next_batch
. I personally think this makes more sense. FWIW I think this is the only possible error case and so it is unlikely user-code would be impacted...