-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
parquet: Make page_index/pushdown metrics consistent with row_group metrics #12545
Conversation
…etrics 1. Rename `{pushdown,page_index}_filtered` to `{pushdown,page_index}_pruned` 2. Add `{pushdown,page_index}_matched` The latter makes it clearer in EXPLAIN ANALYZE when the Page Index is not checked because their row groups were already eliminated (with a Bloom Filter or row group statistics).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @progval -- this looks like a very nice improvement to me. I left some small suggestions but I don't think they are required to merge this PR
datafusion/core/src/datasource/physical_plan/parquet/page_filter.rs
Outdated
Show resolved
Hide resolved
@@ -276,6 +281,14 @@ fn rows_skipped(selection: &RowSelection) -> usize { | |||
.fold(0, |acc, x| if x.skip { acc + x.row_count } else { acc }) | |||
} | |||
|
|||
/// returns the number of rows not skipped in the selection | |||
/// TODO should this be upstreamed to RowSelection? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks the same as https://docs.rs/parquet/latest/parquet/arrow/arrow_reader/struct.RowSelection.html#method.row_count
It would be great to upstream this and rows_skipped to parquet
-- any chance you are willing to file a ticket to do so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sent to apache/arrow-rs#6429
@@ -223,6 +223,21 @@ Again, reading from bottom up: | |||
- `SortPreservingMergeExec` | |||
- `output_rows=5`, `elapsed_compute=2.375µs`: Produced the final 5 rows in 2.375µs (microseconds) | |||
|
|||
When predicate pushdown is enabled, `ParquetExec` gains the following metrics: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
Co-authored-by: Andrew Lamb <[email protected]>
Thanks agian @progval |
…etrics (apache#12545) * parquet: Make page_index/pushdown metrics consistent with row_group metrics 1. Rename `{pushdown,page_index}_filtered` to `{pushdown,page_index}_pruned` 2. Add `{pushdown,page_index}_matched` The latter makes it clearer in EXPLAIN ANALYZE when the Page Index is not checked because their row groups were already eliminated (with a Bloom Filter or row group statistics). * Add missing metric definitions in the docs Co-authored-by: Andrew Lamb <[email protected]> * s/pass/select/ --------- Co-authored-by: Andrew Lamb <[email protected]>
Which issue does this PR close?
Closes #12543.
Closes #12544.
What changes are included in this PR?
{pushdown,page_index}_filtered
to{pushdown,page_index}_pruned
{pushdown,page_index}_matched
Rationale for this change
The latter makes it clearer in EXPLAIN ANALYZE when the Page Index is not checked because their row groups were already eliminated (with a Bloom Filter or row group statistics).
Are these changes tested?
yes
Are there any user-facing changes?
New metrics in
EXPLAIN ANALYZE
, documented in docs/source/user-guide/explain-usage.md