-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support pass-through queries for the jdbc based connectors #9163
Comments
This isn't true. There is work in progress in this direction - there is some ongoing work in #7994.
Agreed about this. This will most likely always require passthrough though because functions don't have same semantics across databases. e.g. if you do a An alternative we have been thinking about is to be able to let connector's rewrite Trino function calls into equivalent expressions for the remote system. So for example
This is not an easy problem to solve. The query parser needs to be able to parse the query to be able to analyse it, plan it and optimize it. Since the grammar across DBs differs it would require implementing a SQL translation layer which itself is a very large project. One of the biggest problems with passthrough queries is that it's a footgun if used incorrectly because as long as you're thinking about only a single target system it's ok because the semantics are consistent and you'd get consistent (according to remote database semantics) output. But as soon as you have multiple such catalogs in play it becomes very complicated and easy to get silently incorrect results. Due to this reason IMO the effort is better spent on the roadmap items in #18. |
Totally understand why implementing a translation layer is a huge project, but why is the naive "pass through query to underlying system as a string" not feasible?
Is that a bad thing? if you have two postgres connectors each running a different versions of postgres and you send a passthrough query to each, the expectation is you will get slightly different results based on which postgres db you are talking to. Pass through queries solve real problems that currently do not have any good workarounds. While I agree there is potential to misuse them, does the risk of misuse outweigh the benefits? IMO no. |
Would it work to create a view in Redshift and select from it in Trino? The entire view query will be executed in Redshift. |
@electrum yes that is a workaround (as well as materializing a view in redshift). It is what we do today. But when using additional tooling on top of trino like DBT, having to model one-off queries as views gets bloated pretty quickly (see this thread for the motivation: https://trinodb.slack.com/archives/CFQAMGRQE/p1631029170025400?thread_ts=1630955221.022000&cid=CFQAMGRQE) |
@grantatspothero we now have support for Polymorphic Table Functions (see https://trino.io/docs/current/release/release-381.html). |
The postgres/redshift connectors do not support pushdown of arbitrary operations (see docs). This makes performing certain types of queries extremely inefficient since all the data is pulled out of postgres/redshift and processed in memory in trino.
Since supporting pushdown of arbitrary operations is not wanted or feasible, being able to write a pass-through query in trino that runs inside postgres/redshift would be extremely helpful in order to optimize queries where the current pushdown optimizations are not good enough.
There is some precedent for this in Trino, @bitsondatadev noted the elasticsearch connector currently supports a form of pass through queries: https://trino.io/docs/current/connector/elasticsearch.html#pass-through-queries
Examples of problems we have today:
How pass through queries could help solve the above problems (with a fake syntax):
The text was updated successfully, but these errors were encountered: