You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem or challenge?
CREATE TABLE does not parse the Page Index, and SELECT does not cache it. This can make requests on large Parquet datasets take a significant time for queries that have a small number of results.
For example, with a simple SELECT int_column, other_int_column WHERE int_column=123456 on a table with 184 billion rows (so about 9 million Page Index items, given the default 20k page size)
Parse it once and for all, either on CREATE TABLE or lazily as SELECT queries read the files. (Note that in the case of partitioned tables, not all files may be read by the first SELECT)
Is your feature request related to a problem or challenge?
CREATE TABLE
does not parse the Page Index, andSELECT
does not cache it. This can make requests on large Parquet datasets take a significant time for queries that have a small number of results.For example, with a simple
SELECT int_column, other_int_column WHERE int_column=123456
on a table with 184 billion rows (so about 9 million Page Index items, given the default 20k page size)Describe the solution you'd like
Parse it once and for all, either on
CREATE TABLE
or lazily asSELECT
queries read the files. (Note that in the case of partitioned tables, not all files may be read by the firstSELECT
)Describe alternatives you've considered
https://github.com/apache/datafusion/blob/3b93cc952b889cec2364ad2490ae18ecddb3ca49/datafusion-examples/examples/advanced_parquet_index.rs
but it requires using the low-level API, and is not available through the SQL or Python interfaces.
Additional context
No response
The text was updated successfully, but these errors were encountered: