Query state shows Finishing and stays for long when the result set is large #23759

mkg2code · 2024-10-11T15:39:54Z

Use case : The partitioned data is around 1.3 million records and all of this is required to be transferred to reporting framework for slicing and dicing. Also, this is require for offline processing. The reporting framework is written in C#.

Observations:
Trino takes more than 1 minute to transfer the data from ADLS to client ( DBeaver). Trino cluster has 3 workers (56GB, 7Cores), 1 coordinator (24Gi, 7Cores). DBEaver has Java heap max limit of 10GB.
The state of the query goes into "Finishing" quickly and stays like that until the data transfer happens.

Duckdb can pull the same amount of data from same same source in 28secs. This is also executed from DBeaver.

Note : DBeaver is used as a client for benchmarking.

The data in the ADLS is stored as parquet and the size is only 160MB. But the Trino plan shows its transferring around 664MB

snippet from the Trino Plan

Trino version: 457
Queued: 401.91us, Analysis: 50.97ms, Planning: 135.56ms, Execution: 1.85s
Fragment 1 [SOURCE]
CPU: 3.55s, Scheduled: 4.85s, Blocked 112.26ms (Input: 0.00ns, Output: 112.28ms), Input: 1392597 rows (664.15MB); per task: avg.: 464199.00 std.dev.: 164122.31, Output: 1392597 rows (664.15MB)
Amount of input data processed by the workers for this stage might be skewed
Output layout: [field1, field2, ...]
Output partitioning: SINGLE []
TableScan[table = datalake:demo:table1]
Layout: [field1:integer, etc..]
Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B}
CPU: 3.55s (100.00%), Scheduled: 4.85s (100.00%), Blocked: 112.00ms (100.00%), Output: 1392597 rows (664.15MB)
Input avg.: 174074.63 rows, Input std.dev.: 100.00%
reportid := reportid:int:PARTITION_KEY
:: [[8]]
Input: 1392597 rows (664.15MB), Physical input: 160.95MB, Physical input time: 993.74ms

Also, would appreciate if there is a ODBC driver which is highly optimized for the data transfer.

wendigo · 2024-10-11T16:37:57Z

Can you try running JDBC with disableCompression=true as connection param?

wendigo · 2024-10-11T16:39:58Z

The current protocol is not designed to transfer large amount of data between the cluster and the client. For high throughput cases we are introducing spooled protocol (#22662). This can handle much larger transfers with higher throughput.

georgewfisher · 2024-10-11T17:08:26Z

@wendigo are column oriented results supported by the spooled protocol already?

#22662 is closed where can @mkg2code track this if it's not available yet?

wendigo · 2024-10-11T17:13:03Z

@georgewfisher not yet :) We want to first announce the GA for the protocol and then plan next steps.

wendigo · 2024-10-11T17:14:19Z

@georgewfisher #22271 for tracking

mani-sethu · 2024-10-15T17:24:57Z

What is the fix for this in existing 444 version? Even I am facing the same issue, I am firing query from metabase and even after I am able to see the message on metabase, the query is stuck in pending state

wendigo · 2024-10-15T17:27:59Z

@mani-sethu there is no fix for 444

mani-sethu · 2024-10-15T17:31:58Z

At what version this got fixed?

wendigo · 2024-10-15T17:34:42Z

@mani-sethu this will be a preview feature in some future release, can't say which right now, and to be honest this is not a bug - the client is expected to fully consume the results according to the protocol specification (and this applies to all versions)

wendigo closed this as completed Oct 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query state shows Finishing and stays for long when the result set is large #23759

Query state shows Finishing and stays for long when the result set is large #23759

mkg2code commented Oct 11, 2024

wendigo commented Oct 11, 2024

wendigo commented Oct 11, 2024

georgewfisher commented Oct 11, 2024 •

edited

Loading

wendigo commented Oct 11, 2024

wendigo commented Oct 11, 2024

mani-sethu commented Oct 15, 2024

wendigo commented Oct 15, 2024

mani-sethu commented Oct 15, 2024

wendigo commented Oct 15, 2024 •

edited

Loading

Query state shows Finishing and stays for long when the result set is large #23759

Query state shows Finishing and stays for long when the result set is large #23759

Comments

mkg2code commented Oct 11, 2024

wendigo commented Oct 11, 2024

wendigo commented Oct 11, 2024

georgewfisher commented Oct 11, 2024 • edited Loading

wendigo commented Oct 11, 2024

wendigo commented Oct 11, 2024

mani-sethu commented Oct 15, 2024

wendigo commented Oct 15, 2024

mani-sethu commented Oct 15, 2024

wendigo commented Oct 15, 2024 • edited Loading

georgewfisher commented Oct 11, 2024 •

edited

Loading

wendigo commented Oct 15, 2024 •

edited

Loading