Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arrow-ipc: Default to not preserving dict IDs #6788

Merged
merged 2 commits into from
Nov 30, 2024

Conversation

brancz
Copy link
Contributor

@brancz brancz commented Nov 25, 2024

Which issue does this PR close?

Related to #5981

Rationale for this change

This is the first step towards removing the dict_id field as discussed in #5981. With this patch the default behavior changes to what the behavior will be once the field is fully removed.

The previous behavior can still be restored by passing with_preserve_dict_id(true), however, doing so is now deprecated and will be removed together with the dict_id in the next (March) DataFusion release.

What changes are included in this PR?

Default to not preserving the dict ID from the schema field dict_id.

Are there any user-facing changes?

Not a breaking change to an API, but the default behavior changes.

@tustvold @alamb

@tustvold
Copy link
Contributor

The integration tests seem to be legitimately failing on this

@brancz
Copy link
Contributor Author

brancz commented Nov 25, 2024

Is this the appropriate docs page to look at for trying to reproduce this locally? https://github.com/apache/arrow-rs/tree/main/arrow-integration-testing

@brancz
Copy link
Contributor Author

brancz commented Nov 26, 2024

So diving in, it looks like ipc, and c-data work fine, it's just flight. And surprisingly even rust-to-rust seems broken, which is what I'm going to start with, by adding more tests to arrow-flight.

Previously the integration tests forced preserving dict IDs in some
places and used the default in others. This worked fine previously
because preserving dict IDs used to be the default, but it isn't
anymore.
@brancz brancz force-pushed the not-preserve-default branch from 75ceb39 to 206f7f4 Compare November 26, 2024 17:31
@brancz
Copy link
Contributor Author

brancz commented Nov 26, 2024

@tustvold @alamb tests are now green!

@tustvold tustvold added the api-change Changes to the arrow API label Nov 27, 2024
Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me, I've labelled it an API change so it is rendered as a breaking change in the changelog

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also makes sense to me too

@alamb
Copy link
Contributor

alamb commented Nov 30, 2024

Thank you @brancz 🙏

@alamb alamb merged commit d7581ef into apache:main Nov 30, 2024
26 checks passed
@brancz brancz deleted the not-preserve-default branch November 30, 2024 18:34
@brancz
Copy link
Contributor Author

brancz commented Nov 30, 2024

Thank you!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-change Changes to the arrow API arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants