-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Upgrade Arrow support to 4.0.0 #7224
Comments
Also relevant: arrow format version support vs. lib version dependency - see http://arrow.apache.org/docs/format/Versioning.html . I didn't find a clear breakdown on their sites for tracking these. |
Hey @mughetto we're currently evaluating our options here. While there's people wanting newer versions of Arrow, others need to continue using Arrow 1.x, so ideally we'd support 1.0.1+, but that's not 100% straightforward to do. We're currently looking into this. |
To Keith's point: More important to us than the particular decision is clarity of rapids release timing in changing arrow formats, and ideally, a sense of what the expected impact area would be |
Yeah the chain of dependencies and who fetches what where is a bit hard to track down at the moment :/ |
@kkraus14 Ok thanks a lot for the quick answer! |
This issue has been labeled |
Pandas 1.2 support was added in #7375 |
This was attempted in #7495 but it was found there was blocking bugs in both Arrow 2.0.0 and 3.0.0 that prevented upgrading. These should be fixed in 4.0.0 where we'll try to upgrade. |
@kkraus14 @galipremsagar Just came across this issue while using cuDF -- I have some other requirements that rely on newer Arrow versions. Could you provide any details about the blocking bugs you encountered in Arrow 2 / 3? I didn't see anything obvious in the comments or gpuCI build logs of #7495. |
It was discussed.on the Arrow mailing list, but on Arrow 3.0.0, you can't create Arrow Arrays or Arrow Tables from GPU backed Buffer objects. In Arrow 2.0.0 there was a bug that prevents round tripping lists of structs columns in the Parquet Reader/Writer. We believe all of the current known issues on our side are fixed in the current tip of Arrow and the 4.0.0 release is scheduled for April which would give us plenty of time to upgrade in our 0.20 release. |
This issue has been labeled |
Fixes: #7224 This PR: - [x] Adds support for arrow 4.0.1 in cudf. - [x] Moves testing-related utilities to `cudf.testing` module. - [x] Fixes miscellaneous errors related to arrow upgrade. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) - Paul Taylor (https://github.com/trxcllnt) Approvers: - Robert Maynard (https://github.com/robertmaynard) - Ram (Ramakrishna Prabhu) (https://github.com/rgsl888prabhu) - Jeremy Dyer (https://github.com/jdye64) - Paul Taylor (https://github.com/trxcllnt) - Dillon Cullinan (https://github.com/dillon-cullinan) - Devavret Makkar (https://github.com/devavret) - Keith Kraus (https://github.com/kkraus14) - Michael Wang (https://github.com/isVoid) - Dante Gama Dessavre (https://github.com/dantegd) URL: #7495
Hi,
we have developped internally a python package that uses pandas to read parquet files with the pyarrow engine. After some testing a few months ago it appeared that using pyarrow 2.0 was much faster than 1.0 for large files so we decided to enforce pyarrow>=2.0 in our requirements.txt
But now that we want to use this package in a RAPIDS environment (say 0.17 + our local pakcage install with pyarrow 2.0) we have noticed that
import cudf
was failing with this trace:Bumping back to pyarrow 1.0.1 solved the issue but represents a loss of performance for us on the pure pandas side.
Are we missing something obvious that would allow us to get pyarrow 2.0 to work with cudf? If not, are there any plan to make things compatible with more recent versions of pyarrow (3.0 as of yesterday)
Please let me know if you need details more about our environments.
Thanks a lot !
The text was updated successfully, but these errors were encountered: