[FEA] Either improve support for or remove type_id::EMPTY #12477
Labels
0 - Backlog
In queue waiting for assignment
feature request
New feature or request
libcudf
Affects libcudf (C++/CUDA) code.
Is your feature request related to a problem? Please describe.
libcudf supports an empty type,
type_id::EMPTY
, that is analogous to arrow's null type used to represent a column of all null values. However, functionality for this type is only implemented in pieces and there are likely many cases where libcudf will fail if provided with such a column (#10761 is one somewhat recent example).Describe the solution you'd like
We should reevaluate the usage of
EMPTY
columns in libcudf, either removing them altogether or making them work more consistently across the code base. Removal seems like the simplest path forward, but there do appear to be some parts of cuIO that do leverageEMPTY
columns, and there's an argument to be made that for conformance with the arrow spec we should maintain this type no matter what. If we keep it, we should make it easier to test APIs with such columns to ensure that they are handled appropriately. We also may need to improve handling of these columns in the higher-level APIs backed by libcudf such as cuDF Python or the Spark plugin.Additional context
It's worth noting that AFAICT a null column is trivial to optimize storage for since all that's needed is a size (both null mask and data are redundant). I don't think such columns are useful enough to spend much engineering effort on optimizations, though.
The text was updated successfully, but these errors were encountered: