[BUG] Parquet reader cannot use filters in tandem with column projection #15051
Labels
1 - On Deck
To be worked on next
bug
Something isn't working
cuIO
cuIO issue
libcudf
Affects libcudf (C++/CUDA) code.
Milestone
Describe the bug
Reading a column-projected parquet file (that has$N$ on disk columns) and simultaneously applying a filter expression to the read columns is not possible, unless the columns we request to read are the first $[0, \dots, K), K \le N$ .
The problem appears to be that the filter expression must (eventually) refer to columns by their index, however one part of the code (that uses row-group filters) expects these indices to match the indices of the columns in the un-projected metadata, whereas another part needs the indices to match the requested projected column indices. It is not possible to arrange this to be the case for either a reordering or a non-dense subset of columns.
Steps/Code to reproduce bug
Build the cudf tests and then run:
Expected behavior
This should work.
The text was updated successfully, but these errors were encountered: