Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trino+iceberg, select count(*) is too slow #19909

Closed
paulpaul1076 opened this issue Nov 27, 2023 · 1 comment
Closed

Trino+iceberg, select count(*) is too slow #19909

paulpaul1076 opened this issue Nov 27, 2023 · 1 comment

Comments

@paulpaul1076
Copy link

I tried 2 queries:

select sum(record_count) from "clickstream_v3$files"

and

select count(*) from clickstream_v3

The first one is very fast, the second one is very slow. Why is the second one slow? It seems to not go into metadata, but literally go row by row in parquet.

@raunaqmorarka
Copy link
Member

COUNT(*) queries on parquet are not executed row by row, it's executed by reading row counts from parquet file footers.
Additionally, #19303 improved this to read only iceberg metadata. Please try this on 430+ release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants