Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cost-based property enforcement #21785

Open
devozerov opened this issue May 1, 2024 · 4 comments
Open

Cost-based property enforcement #21785

devozerov opened this issue May 1, 2024 · 4 comments

Comments

@devozerov
Copy link
Contributor

Motivation

Modern analytical engines use relational operator properties to find optimal plan. Property is a value associated with the operator that doesn't change operator's equivalence and that can be enforced via a special enforcer operator. In distributed systems, there are two common properties:

  1. Distribution - defines how the operator data is distributed across execution units. Enforced via Exchange operator.
  2. Collation - defines how operator's output is sorted. Enforced via Sort operator.

State-of-the-art optimizers are able to find optimal placement of Exchange and Sort operators, possibly enforcing them in various places of the plan. For Exchange, the goal is to find a plan with minimal data movement while still accounting for data skew. For Sort, the goal is to enforce (or propagate) collations to allow for streaming aggregates and merge joins. Both Exchange and Sort placement should be determined via a cost-based optimization.

Examples of products that uses state-of-the-art techniques for :

  1. Greenplum ORCA uses sophisticated property propagation mechanics based on Cascades: https://15721.courses.cs.cmu.edu/spring2016/papers/p337-soliman.pdf
  2. Azue Synapse finds overlapping Exchanges with a cost-based optimizer (see paragraph 3): https://vldb.org/pvldb/vol15/p936-rajan.pdf
  3. Apache Calcite uses memorization and Cascades to find optimal enforcer placement. Any product that uses Apache Calcite can utilize this feature easily. Examples are Dremio and Apache Hive.
  4. Vertica actively uses sorted projections to utilize merge joins.
    Note that many of these products are lakehouse contenders, and some of them are mentioned in recent Forrester Wave lakehouse report: https://reprints2.forrester.com/#/assets/2/848/RES180732/report

Historically, both Presto and Trino relied mostly on heurstic optimizations. Cost-based optimizations are scattered across the code base, yielding multiple local minima while failing to provide globally optimal plan. Examples are overlapping Exchange placement and data skewness checks in AddExchanges, parallelism estimation in DetermineTableScanNodePartitioning, and limited partitioning propagation mechanics via opaque ConnectorPartitioningHandle. There is no Sort placement optimizer. As a result neither Presto, nor Trino can sufficiently exploit properties during optimization. However, Presto community started discussion around the nextgen query optimizer, which means that the product most likely will gather mentioned features sooner or later.

This puts Trino into vulnerable position, because sophisticated property enforcement in many cases allows other products to find better plans and demonstrate superior performance.

Proposal

This issues proposes to start the discussion about advanced property propagation in Trino. This includes (but not limited to):

  1. Allow tables expose different partitioning schemes depending in the parent requirements. This includes revision of ConnectorMetadata.getTableProperties|getCommonPartitioningHandle|makeCompatiblePartitioning as they all assume static predetermined partitioning
  2. Add cost-based AddExchange alternative based on MEMO and top-down Cascades-like algorithm.
  3. Integrate Sort propagation rule (conceptually similar to AddExchange) that will enable Trino using merge join and streaming aggregations.
@findepi
Copy link
Member

findepi commented May 10, 2024

cc @martint

@mosabua
Copy link
Member

mosabua commented Jul 26, 2024

This has been on the agenda to discuss on the Trino Contributor Call a few times now. Ideally @devozerov can join next time and we can discuss more.

https://github.com/trinodb/trino/wiki/Contributor-meetings

@devozerov
Copy link
Contributor Author

devozerov commented Aug 1, 2024

@mosabua Sure, I can prepare a couple slides if that helps

@mosabua
Copy link
Member

mosabua commented Aug 26, 2024

That would be good. Next contributor call is on the 26th of Sept.

https://github.com/trinodb/trino/wiki/Contributor-meetings

Ping me for a calendar invite if you can attend and will present.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants