-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add aggregation pushdown support for count using Iceberg Metrics #15832
base: master
Are you sure you want to change the base?
Conversation
import static java.util.concurrent.TimeUnit.MILLISECONDS; | ||
import static org.apache.iceberg.types.Conversions.fromByteBuffer; | ||
|
||
public class AggregateSplitSource |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will/Can refactor to avoid the duplicate code of IcebergSplitSource
, as it's based on it.
But first thought of getting feedback on the approach, then will do further refactoring.
3046d5d
to
e64a256
Compare
e64a256
to
bc0df7e
Compare
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua |
I would argue it should still look at these. Many drafts are started and abandoned. I've closed out some already that weren't going to be worked on. |
Fair .. lets keep drafts in.. |
@mosabua @bitsondatadev yes Im working on it. Alex, whenever you get time, can you please check. |
Thanks for that background @osscm, I'll flag Alex if he doesn't look at this within the next few days! |
I'll check the conflicts. @alexjo2144 thanks a lot for the discussions. |
Hi Eric,
Thanks for your interest.
I’ll resume, have to rebase them PR.
Also, had a good discussion with the team, and requested if someone can
help to review the PR. So hoping to see progress soon.
Internally we have started working on the min/max as well.
…On Thu, Jun 15, 2023 at 2:25 PM Erin Drummond [BitMEX] < ***@***.***> wrote:
Any progress on this? I'm hanging out for this feature :)
—
Reply to this email directly, view it on GitHub
<#15832 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AXQ2PYTQQ5HANHMRMHTIW3DXLN4TNANCNFSM6AAAAAAUFZ72TY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
bc0df7e
to
0b5ccb4
Compare
0b5ccb4
to
16cd7ee
Compare
Can you fix the checkstyle errors and the look at the test failure @osscm Also @martint @findepi @alexjo2144 .. can you have a look at this |
Hi team, could you please confirm if the approach looks good here and you have time to review this. We have added support for additional aggregation pushdowns and would like to contribute that as well once this goes through. Once we have confirmation on the approach we can go ahead and rebase this again if someone can take a look at it. As we progress, constant rebasing takes time too, so would appreciate feedback on the overall approach first. cc @martint @findepi @electrum @alexjo2144 @osscm |
for Lines 186 to 198 in 6f15559
|
assertQuery(clientSession, format("SELECT COUNT(*) from %s WHERE event_date = DATE '2022-11-02' AND country = 'FRA' AND state = 'Brittany'", tableName), "SELECT 1"); | ||
assertQuery(clientSession, format("SELECT COUNT(*) from %s WHERE event_date = DATE '2022-11-02' AND country = 'FRA' AND state = 'Brittany'", tableName), "SELECT 1"); | ||
|
||
assertQuery(clientSession, format("SELECT COUNT(*) from %s WHERE event_date = DATE '2022-11-02' AND country = 'USA' AND state = 'Corsica'", tableName), "SELECT 1"); | ||
assertQuery(format("SELECT COUNT(*) from %s WHERE event_date = DATE '2022-11-02' AND country = 'USA' AND state = 'Corsica'", tableName), "SELECT 1"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need some tests which validate that agg pushdown actually happened by looking at the explain plan?
f392621
to
72b2cda
Compare
c32b0d4
to
38b9fc3
Compare
As we discussed, we will go ahead with this property check, and you will also check with the Iceberg's team to add this property to be part of the Spec, as it is already being updated. I have also discussed it with @flyrain (Iceberg team), and he also gave +1 for this. |
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua |
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua |
Closing this pull request, as it has been stale for six weeks. Feel free to re-open at any time. |
We really need to get this in .. cc @alexjo2144 @findepi @findinpath @cwsteinbach @dain .. I am going to add the stale-ignore label so we wont have it closed again. |
Given discussion in https://trinodb.slack.com/archives/CJ6UC075E/p1722544669964209 I wonder how we can proceed here. cc @dain @findepi @raunaqmorarka |
@findepi @dain would it be fine to land it behind feature flag (disabled by default)? Additionally, we would like to implement similar logic as for Spark (https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java#L309). It's not perfect, so in the meantime we could work with Iceberg community to have more firm spec for column stats accuracy. Once this is done, we could enable it by default. |
@sopel39 there are basic unresolved comments at this point, so there is probably no point in another review round, least to talk about landing. |
@findepi that's exactly what I mean rather than landing without another review round.
It seems Iceberg itself is violating it's own spec with Spark unfortunately regarding to
Don't we prune already using Iceberg stats? Anyways, pruning and agg pushdown are completely different things. Also, we talked about
So we should be good to just have it enabled for |
Yes to both
You're right. I will copy #15832 (comment) here: Agreed. However, we need to understand what the approach would look like to support other aggregations than |
I think to get guarantees for other stats Iceberg spec has to be updated. My suggestion would be some kind of optional field in the metadata itself specifying if stats are estimates or not. cc @osscm |
Just like @raunaqmorarka did for Parquet footer. i like the idea. Note that it's not must-have to do some aggregation pushdown on top of iceberg. |
@findepi I've create Iceberg spec proposal: apache/iceberg#10930 |
fyi @cwsteinbach and @alexjo2144 |
Description
Iceberg maintains metrics of the data files in the Iceberg metadata. So, we can utilize these metrics to optimize the simple aggregate queries.
To achieve this, Query Engine needs to support aggregation pushdown at the connector layer. Trino connector framework supports aggregation pushdown but Iceberg and Hive connectors have not implemented it. Aggregation Pushdown.
If Iceberg connector can support it then we can use the Iceberg metrics to give some of the answers like count, min, and max with the partition filters. Currently, in Iceberg and Hive connectors, aggregations are handled at the engine level.
Important points
This is a single-threaded implementation and generates a single split to populate the page/data for the result.
We might be able to implement it in a distributed manner If an engine can support handling aggregation at the top level while handing the aggregation pushdown. Like how the engine handles count(*) by summing the result/count it gets from different workers.
This implementation will also not handle the situation where the count is not present for some files/partitions so that some metrics can be used partially, and for some data still, the metrics will be read from the actual data files.
My understanding is, that this would require the following feature to be done: #13534
It can handle positional delete but not equality-based deletes.
Additional context and related issues
ConnectorMetadata.applyAggregation(....)
can result into projected column per aggregate function, if it needs to pushed downapplyAggregation
is called by the QueryEngine in and latter on in the flow, it calls theConnectorSplitManager
to generate the splits.IcebergSplitManager
creates theIcebergSplitSource
now it can createAggregateSplitSource
as well.AggregateSplitSource
(like IcebergSplitSource) apply the partition filter and get the dataFiles needed, and then use the metrics. Now it can create AggregationSplit with the calculated count/metrics.IcebergPageSourceProvider.createPageSource
gets the split, now here if aggregation pushdown is used then create theAggregatePageSource
based on thecount/min/max
available in the split.This feature is controlled by session level flag
aggregation_pushdown_enabled
Following type of queries can be benefitted
Release notes
( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:
For #10974