-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trino-cli has no option to accept targetResultSize query parameter #22303
Comments
What is the use case you are trying to solve with exposing this parameter? The protocol is already conversational so there is no real limit for the overall data returned. |
Hi @mosabua Currently in below mentioned code I see that for query get request to fetch resultset there is a queryparameter targetResultSize defined with a default size as 16MB i.e 16MB of result set would be sent to client from coordinator per get request. Another thing to note here is this parameter is never set i.e it's always null because there is no way this parameter can be set from client. but I see a condition to check take min of this value and max target size which is 128MB if it is set. If this parameter can be set through client the queries that would fetch large resultset from coordinator mininization of the get request calls can be done which would reduce the wait time and queries would complete faster. ExecuteStatementResource.java( trino/core/trino-main/src/main/java/io/trino/server/protocol/ExecutingStatementResource.java Line 216 in 1b0a011
please let me know your comments. |
This might be something we can do in our current work on improving client protocol performance with Project Swift. Note that we don't know if increasing the size does indeed improve performance. Did you test this and can report any findings? Also .. ideally users would not have to configure such things just to get a faster connection. |
What I have observed is in code targetResultSize was set as QueryParam and was never getting used or set from trino client while running a query, so I was just curious on why it was not done. I have run a query which returns approximately 6GB of data with 7.7Million rows on a trino cluster from DBVisualizer using trino-jdbc client driver jar, these are my findigs:
|
@mosabua I can't agree with these numbers - I've been benchmarking changing this value and there is a diminishing improvement for values over 32 MB. The buffering and compression happens on the coordinator so setting this value high, puts an additional pressure on the coordinator. That doesn't scale well. |
My trino cluster has 6 nodes = 5 workers and 1 coordinator(doesn't act as worker) where each node has 1TB memory and 128 CPU where I have run the query. |
We don't want to expose this parameter to the clients. With the spooled client protocol extension (#22662) it won't be effective either way. |
Interesting, I just added this parameter to the Power BI connector for testing and I did notice a significant performance change on my local machine, where 16MB was the fastest for loading 500k rows into Power BI. TargetResultSize 1MB (default) --> time taken: 00:02:37 TargetResultSize 16MB --> time taken: 00:01:46 TargetResultSize 128MB --> time taken: 00:01:58 |
@pichlerpa this puts an additional pressure on the coordinator as the last response is cached in order for the client to be able to retry the last call. That's why it's not recommended approach. |
@wendigo I understand, but I think it's great to have the option at least. In case of Power BI, this can be very useful to speed up imports which often happen just once per night or a few times during the day. |
@pichlerpa that's why we've added a spooled protocol that allows you to have segments of a bigger size. In our benchmarks the new jdbc driver is 2-4x faster. |
@wendigo Ok, yes, it seems to work like Databricks' cloud fetch mechanism. But what if I don't have an object store available? Isn't this a relatively simple option to improve performance, as long as the coordinator isn't too busy at those times to avoid OOM errors? I don't really see the problem, if you know the system and functionality, why not? Isn't it better than having endless round-trips to collect query results. |
@pichlerpa we are considering adding a spooling manager that works with the local worker storage |
@wendigo is there any documentation or reference available for how to implement the spooling protocol in a new connector/client not relying on JDBC? Thanks for your feedback BTW! |
In query result GET request from coordinator it has a query parameter targetResultSize which is per get request how much a coordinator can send to client.
I see there is no way to configure it from trino client.
The text was updated successfully, but these errors were encountered: