-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-21761: [Python] accept pyarrow scalars in array constructor #36162
GH-21761: [Python] accept pyarrow scalars in array constructor #36162
Conversation
I am not sure why linter is failing on this PR. I will try to rebase and see if it helps. |
08297d0
to
c84c3a4
Compare
The tests failing already have an open issue: #35728, so I think this PR is ready for review. One missing thing are the test for sets where the elements in the set get reordered when getting passed to the arrow/python/pyarrow/tests/test_convert_builtin.py Lines 2448 to 2449 in 3da9ba6
|
The failing tests are all due to a known issue with |
Will merge this today as the CI is green 👍 |
Thanks Alenka! |
Conbench analyzed the 6 benchmark runs on commit There were 4 benchmark results indicating a performance regression:
The full Conbench report has more details. |
It looks like soon after I started investigating scalar conversions for #14121 (but well before I made the PR) a major underlying hole was plugged in pyarrow via apache/arrow#36162. Most of #14121 was created to give us a way to handle scalars from pyarrow generically in libcudf. Now that pyarrow scalars can be easily tossed into arrays, we no longer really need separate scalar functions in libcudf; we can simply create an array from the scalar, put it into a table, and then call the table function. Additionally, arrow also has a function for creating an array from a scalar. This function is not new but [was previously undocumented](apache/arrow#40373). The builder code added to libcudf in #14121 can be removed and replaced with that factory. The scalar conversion is as simple as calling that arrow function and then using our preexisting `from_arrow` function on the resulting array. For now this PR is just a simplification of internals. Future PRs will remove the scalar API once we have a more standard path for the conversion of arrays via the C Data Interface. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Yunsong Wang (https://github.com/PointKernel) - David Wendt (https://github.com/davidwendt) - Bradley Dice (https://github.com/bdice) URL: #15213
Rationale for this change
Currently,
pyarrow.array
doesn't accept list of pyarrow Scalars and this PR adds a check to allow that.