Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ipc): Support writing dictionaries nested in structs and unions #870

Merged
merged 2 commits into from
Oct 29, 2021

Conversation

helgikrs
Copy link
Contributor

Which issue does this PR close?

Part of #846

Rationale for this change

Dictionaries are lost when serializing a RecordBatch for IPC, producing invalid arrow data. This PR changes encoded_batch to recursively find all dictionary fields within the schema (currently only in structs and unions) so nested dictionaries are properly serialized.

Dictionaries are lost when serializing a RecordBatch for IPC, producing
invalid arrow data. This PR changes encoded_batch to recursively find
all dictionary fields within the schema (currently only in structs and
unions) so nested dictionaries are properly serialized.
@github-actions github-actions bot added the arrow Changes to the arrow crate label Oct 26, 2021
@helgikrs helgikrs changed the title feat(ipc): Support for writing dictionaries nested in structs and unions feat(ipc): Support writing dictionaries nested in structs and unions Oct 26, 2021
@codecov-commenter
Copy link

Codecov Report

Merging #870 (86c699f) into master (e44f1ad) will decrease coverage by 0.01%.
The diff coverage is 71.42%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #870      +/-   ##
==========================================
- Coverage   82.45%   82.43%   -0.02%     
==========================================
  Files         168      168              
  Lines       48150    48206      +56     
==========================================
+ Hits        39700    39741      +41     
- Misses       8450     8465      +15     
Impacted Files Coverage Δ
arrow/src/array/cast.rs 85.71% <ø> (ø)
arrow/src/ipc/writer.rs 84.09% <71.42%> (-2.14%) ⬇️
parquet_derive/src/parquet_field.rs 66.59% <0.00%> (-0.23%) ⬇️
parquet/src/arrow/array_reader.rs 77.81% <0.00%> (+0.08%) ⬆️
arrow/src/array/array_union.rs 90.24% <0.00%> (+0.97%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e44f1ad...86c699f. Read the comment docs.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change makes sense to me. Thank you @helgikrs

When #846 is done, it would be great to have an end to end test (such as the reproducer shown on that ticket) for reading/writing structs with nested dictionaries


if let DataType::Dictionary(_key_type, _value_type) = column.data_type() {
) -> Result<()> {
// TODO: Handle other nested types (map, list, etc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be worth filing another tracking ticket if you have time

@alamb alamb merged commit c7cf8f7 into apache:master Oct 29, 2021
alamb pushed a commit that referenced this pull request Nov 4, 2021
…870)

* feat(ipc): Support for writing dictionaries nested in structs and unions

Dictionaries are lost when serializing a RecordBatch for IPC, producing
invalid arrow data. This PR changes encoded_batch to recursively find
all dictionary fields within the schema (currently only in structs and
unions) so nested dictionaries are properly serialized.

* address lint and clippy
alamb added a commit that referenced this pull request Nov 5, 2021
…870) (#915)

* feat(ipc): Support for writing dictionaries nested in structs and unions

Dictionaries are lost when serializing a RecordBatch for IPC, producing
invalid arrow data. This PR changes encoded_batch to recursively find
all dictionary fields within the schema (currently only in structs and
unions) so nested dictionaries are properly serialized.

* address lint and clippy

Co-authored-by: Helgi Kristvin Sigurbjarnarson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants