-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove row group filter for map and list #10510
Conversation
✅ Deploy Preview for meta-velox canceled.
|
8cbcd96
to
eb2dc6a
Compare
@kagamiori has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Hi @yma11, could you please rebase this PR onto the latest main? That would help with merging. Thanks! |
@kagamiori has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
1 similar comment
@kagamiori has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Hi @yma11, there are three unit tests failing internal: velox/dwio/parquet/tests/reader:e2e_filter_test - E2EFilterTest.list, E2EFilterTest.map, and E2EFilterTest.subfieldsPruning. They all fails with SIGABRT "Assertion '__n < this->size()' failed." with the same stack trace, so the issues are related. Below is the full stacktrace of E2EFilterTest.list.
What I'm seeing is that the unit test reads a struct that contains an array of integers. Because this PR removes the override of filterRowGroups() from ListColumnReader, when StructColumnReader::filterRowGroups() calls Could you take a look and see if you could reproduce this locally? Thanks! cc @Yuhta |
@@ -211,14 +211,6 @@ void MapColumnReader::read( | |||
elementReader_->seekTo(childTargetReadOffset_, false); | |||
} | |||
|
|||
void MapColumnReader::filterRowGroups( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot the base class implementation is not suitable for these 2 classes. Can we override the methods with empty implementations instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Sorry not noticing that. Let me update. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @yma11, just to confirm, do you see the failure when you run velox_parquet_e2e_filter_test? I wonder why the PR-time tests didn't catch this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @kagamiori, I can't reproduce any failures in this suite locally. FYI.
@kagamiori has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@kagamiori merged this pull request in e7a6ffc. |
Conbench analyzed the 1 benchmark run on commit There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
There is a fast path for filter all null columns in
ScanSpec
based onstatistics
by assuming all null happens whenNumberOfValues
equals0
. But when dofilterRowGroups
against one line file with nullMap
, this rule will apply to theKey
column reader, which hastestNull()
asfalse
and finally decide this rowgroup should be skipped. It means this rule doesn't apply for the columns likeMap::Key
. But limited to the info that statistics can provide, we don't have a better way to refine this fast path, so we remove thefilterRowGroups
inMapColumnReader
in this PR.