-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cost-based property enforcement #21785
Comments
cc @martint |
This has been on the agenda to discuss on the Trino Contributor Call a few times now. Ideally @devozerov can join next time and we can discuss more. https://github.com/trinodb/trino/wiki/Contributor-meetings |
@mosabua Sure, I can prepare a couple slides if that helps |
That would be good. Next contributor call is on the 26th of Sept. https://github.com/trinodb/trino/wiki/Contributor-meetings Ping me for a calendar invite if you can attend and will present. |
Motivation
Modern analytical engines use relational operator properties to find optimal plan. Property is a value associated with the operator that doesn't change operator's equivalence and that can be enforced via a special enforcer operator. In distributed systems, there are two common properties:
Exchange
operator.Sort
operator.State-of-the-art optimizers are able to find optimal placement of
Exchange
andSort
operators, possibly enforcing them in various places of the plan. For Exchange, the goal is to find a plan with minimal data movement while still accounting for data skew. For Sort, the goal is to enforce (or propagate) collations to allow for streaming aggregates and merge joins. Both Exchange and Sort placement should be determined via a cost-based optimization.Examples of products that uses state-of-the-art techniques for :
Note that many of these products are lakehouse contenders, and some of them are mentioned in recent Forrester Wave lakehouse report: https://reprints2.forrester.com/#/assets/2/848/RES180732/report
Historically, both Presto and Trino relied mostly on heurstic optimizations. Cost-based optimizations are scattered across the code base, yielding multiple local minima while failing to provide globally optimal plan. Examples are overlapping Exchange placement and data skewness checks in
AddExchanges
, parallelism estimation inDetermineTableScanNodePartitioning
, and limited partitioning propagation mechanics via opaqueConnectorPartitioningHandle
. There is no Sort placement optimizer. As a result neither Presto, nor Trino can sufficiently exploit properties during optimization. However, Presto community started discussion around the nextgen query optimizer, which means that the product most likely will gather mentioned features sooner or later.This puts Trino into vulnerable position, because sophisticated property enforcement in many cases allows other products to find better plans and demonstrate superior performance.
Proposal
This issues proposes to start the discussion about advanced property propagation in Trino. This includes (but not limited to):
ConnectorMetadata.getTableProperties|getCommonPartitioningHandle|makeCompatiblePartitioning
as they all assume static predetermined partitioningAddExchange
alternative based on MEMO and top-down Cascades-like algorithm.Sort
propagation rule (conceptually similar toAddExchange
) that will enable Trino using merge join and streaming aggregations.The text was updated successfully, but these errors were encountered: