Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump arrow2 #2855

Closed
wants to merge 6 commits into from
Closed

Bump arrow2 #2855

wants to merge 6 commits into from

Conversation

v0y4g3r
Copy link

@v0y4g3r v0y4g3r commented Jul 8, 2022

Which issue does this PR close?

Closes #2709.

Rationale for this change

Current branch arrow2 is still using arrow2 version v0.10 and falls far behind the latest version v0.12

What changes are included in this PR?

Dependencies upgraded:

  • arrow2: v0.10 -> v0.12
  • parquet2: v0.12 -> v0.13
  • arrow-format: 0.4->0.6
  • prost: 0.9 -> 0.10
  • prost-types: 0.9 -> 0.10
  • tonic: 0.6 -> 0.7
  • topic-build: 0.6 -> 0.7

API change:

  • arrow2::error::ArrowError -> arrow2::error::Error
  • arrow2 FileWriter now accepts Vec<Vec<Encoding>> instead of Vec<Encoding> and transverse can be used to create encodings from schema with a customized mapping.
  • arrow2 RowGroupMetadata no longer as a column(usize) method to fetch column metadata at given index, instead, it provides a columns() method that returns a column slice.
  • datafusion::dataframe::DataFrame::write_parquet method now accepts arrow::io::parquet::write::WriteOptions instead of parquet::write::WriteOptions
  • some other minor changes.

Are there any user-facing changes?

Yes, please refer to previous section for full API change list.
But I think it's rather easy for users of arrow2 branch to adapt to these changes.

About tests

Some tests are failling in current arrow2 branch, so I'm not going to fix all failing tests in this PR, instead I'll make sure no more unit test fails. I'd be happy to fix all failing tests after this PR is merged so I don't have to fix them twice :)

@github-actions github-actions bot added ballista datafusion Changes in the datafusion crate labels Jul 8, 2022
@andygrove
Copy link
Member

@v0y4g3r I'd like to start understanding more about the efforts here. Is the plan to put this behind a cargo feature so that we can switch between arrow and arrow2?

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some license headers missing:

NOT APPROVED: ballista/rust/core/src/memory_stream.rs (./ballista/rust/core/src/memory_stream.rs): false
NOT APPROVED: datafusion/src/physical_plan/sort.rs (./datafusion/src/physical_plan/sort.rs): false
NOT APPROVED: datafusion/tests/sql_integration.rs (./datafusion/tests/sql_integration.rs): false
NOT APPROVED: datafusion-physical-expr/src/test_util.rs (./datafusion-physical-expr/src/test_util.rs): false

@v0y4g3r
Copy link
Author

v0y4g3r commented Jul 9, 2022

@v0y4g3r I'd like to start understanding more about the efforts here. Is the plan to put this behind a cargo feature so that we can switch between arrow and arrow2?

No, it's simply about upgrading dependencies.

But I'm also interested in something like defining a trait to hide the API differences between arrow and arrow2 and use cargo feature to provide an option. But I have no ideas on how to implement that now.

There are some license headers missing:

NOT APPROVED: ballista/rust/core/src/memory_stream.rs (./ballista/rust/core/src/memory_stream.rs): false
NOT APPROVED: datafusion/src/physical_plan/sort.rs (./datafusion/src/physical_plan/sort.rs): false
NOT APPROVED: datafusion/tests/sql_integration.rs (./datafusion/tests/sql_integration.rs): false
NOT APPROVED: datafusion-physical-expr/src/test_util.rs (./datafusion-physical-expr/src/test_util.rs): false

Seems like current arrow2 branch doesn't have license header. But it's ok I added headers.

@v0y4g3r
Copy link
Author

v0y4g3r commented Jul 11, 2022

@andygrove Wondering what I still need to do in order to get this merged and I'd be happy to follow up.

Copy link

@dbr dbr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully a easier to maintain solution can be found, but thanks for doing this, very useful in the mean time!

Tested this (with the feature flag removed as per other comment) in our application and it seems to work nicely

datafusion/src/lib.rs Outdated Show resolved Hide resolved
@v0y4g3r
Copy link
Author

v0y4g3r commented Jul 14, 2022

@andygrove Seems like some workflows need a maintainer's approval to run.

@houqp
Copy link
Member

houqp commented Jul 18, 2022

@v0y4g3r looks like there are some minor build issues that need to fixed.

@alamb alamb marked this pull request as draft August 9, 2022 15:17
@alamb
Copy link
Contributor

alamb commented Aug 9, 2022

Marking as draft to note it is not ready for merging -- please change it back to ready for review when it is ready. Thanks!

@v0y4g3r v0y4g3r marked this pull request as ready for review September 26, 2022 06:28
@v0y4g3r
Copy link
Author

v0y4g3r commented Sep 26, 2022

@v0y4g3r looks like there are some minor build issues that need to fixed.

These tests are failing in current arrow2 branch, and is not introduced by this PR.

Anyway, I'm working on fixing these tests in another pull request, but there's something different between arrow and arrow2.

@alamb
Copy link
Contributor

alamb commented Oct 12, 2022

Marking this PR as a draft as I am cleaning up the review queue. Please mark it as ready for review when it is

@alamb alamb marked this pull request as draft October 12, 2022 18:07
@alamb
Copy link
Contributor

alamb commented Jan 14, 2023

This PR is more than 6 month old, so closing it down for now to clean up the PR list. Please reopen if this is a mistake and you plan to work on it more

@alamb alamb closed this Jan 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants