-
Notifications
You must be signed in to change notification settings - Fork 917
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Struct and map support for Parquet reader (#6318)
* [struct] Fix struct filtering. Filtering a table that contains struct-columns fails because struct columns cannot yet be deep-copied from a column-view. This commit fixes the problem. * Initial struct dtype * [struct] Fix struct filtering. Added Struct<List> test. Removed errant prints, extra whitespace. * [struct] Fix struct filtering. Added tests for cloning Struct<List> and List<Struct<List>> columns. Code formatting has been fixed, also. * Add a __repr__ for struct dtype * Initial struct column support * Post-process to fix struct names * Copy struct field names over to libcudf result * Fix typo * Handle all null child in struct * Mask handling in StructColumn.from_arrow. Add tests * Struct dtype equality tests * Fields ordering test * Struct and map support for Parquet reader. * Changelog * Clang format. * Changelog for 6318 * Add tests for StructDtype.fields * Fixed a bug in the column_buffer move constructor : was not propagating is_nullable flag. * Add hierarchy column name information to table_with_metadata struct for the parquet reader. * PR review changes. * Change the code that returns column hierarchy names to include all cudf columns in the hierarchy, including things like list or string offsets - basically things that aren't actually part of the schema. Python needs this info to intelligently traverse the hierarchy and map columns to specific field names in structs. * Add back ListColumn.to_arrow() for now * Remove breakpoints * Simplify categorical handling in to_arrow * Fix up more to_arrow issues * Keep specialization for lists * Construct dtype from children in to_arrow() * Test for parquet struct reader. * PR comment changes. * Address review comments * Another round of cpp PR changes. Fixed merge conflicts in python. * Fix base size of StructColumn * Fixing up logic for generating elements in ListColumn.to_arrow * Undo string compare change in is_list_dtype * Remove duplicates in CHANGELOG. Parameterized struct tests. * Make sure to assert on expect_eq Co-authored-by: Mithun RK <[email protected]> Co-authored-by: Ashwin Srinath <[email protected]>
- Loading branch information
1 parent
305e424
commit 84557ea
Showing
22 changed files
with
1,148 additions
and
564 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.