Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-21761: [Python] accept pyarrow scalars in array constructor #36162

Conversation

AlenkaF
Copy link
Member

@AlenkaF AlenkaF commented Jun 19, 2023

Rationale for this change

Currently, pyarrow.array doesn't accept list of pyarrow Scalars and this PR adds a check to allow that.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Jun 19, 2023
@github-actions github-actions bot added awaiting changes Awaiting changes awaiting review Awaiting review and removed awaiting committer review Awaiting committer review awaiting review Awaiting review awaiting changes Awaiting changes labels Jun 20, 2023
@AlenkaF
Copy link
Member Author

AlenkaF commented Jun 20, 2023

I am not sure why linter is failing on this PR. I will try to rebase and see if it helps.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Jun 20, 2023
@AlenkaF AlenkaF force-pushed the gh-21761_array-accepting-scalars-using-cpp branch from 08297d0 to c84c3a4 Compare June 20, 2023 09:12
@AlenkaF
Copy link
Member Author

AlenkaF commented Jun 22, 2023

The tests failing already have an open issue: #35728, so I think this PR is ready for review.

One missing thing are the test for sets where the elements in the set get reordered when getting passed to the pa.array constructor. Not sure there is a solution for it.

if type(seq(scalar_data)) == set:
pytest.skip("TODO: The elements in the set get reordered.")

@AlenkaF AlenkaF marked this pull request as ready for review June 22, 2023 10:03
@github-actions github-actions bot added the awaiting change review Awaiting change review label Jun 29, 2023
@AlenkaF
Copy link
Member Author

AlenkaF commented Jun 29, 2023

The failing tests are all due to a known issue with test_total_bytes_allocated.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Jun 29, 2023
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jul 3, 2023
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Jul 4, 2023
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jul 5, 2023
@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels Jul 5, 2023
@AlenkaF
Copy link
Member Author

AlenkaF commented Jul 5, 2023

Will merge this today as the CI is green 👍

@jorisvandenbossche jorisvandenbossche merged commit b116b8a into apache:main Jul 5, 2023
@jorisvandenbossche jorisvandenbossche removed the awaiting merge Awaiting merge label Jul 5, 2023
@jorisvandenbossche
Copy link
Member

Thanks Alenka!

@AlenkaF AlenkaF deleted the gh-21761_array-accepting-scalars-using-cpp branch July 5, 2023 09:08
@conbench-apache-arrow
Copy link

Conbench analyzed the 6 benchmark runs on commit b116b8ab.

There were 4 benchmark results indicating a performance regression:

The full Conbench report has more details.

rapids-bot bot pushed a commit to rapidsai/cudf that referenced this pull request Mar 8, 2024
It looks like soon after I started investigating scalar conversions for #14121 (but well before I made the PR) a major underlying hole was plugged in pyarrow via apache/arrow#36162. Most of #14121 was created to give us a way to handle scalars from pyarrow generically in libcudf. Now that pyarrow scalars can be easily tossed into arrays, we no longer really need separate scalar functions in libcudf; we can simply create an array from the scalar, put it into a table, and then call the table function.

Additionally, arrow also has a function for creating an array from a scalar. This function is not new but [was previously undocumented](apache/arrow#40373). The builder code added to libcudf in #14121 can be removed and replaced with that factory. The scalar conversion is as simple as calling that arrow function and then using our preexisting `from_arrow` function on the resulting array.

For now this PR is just a simplification of internals. Future PRs will remove the scalar API once we have a more standard path for the conversion of arrays via the C Data Interface.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Yunsong Wang (https://github.com/PointKernel)
  - David Wendt (https://github.com/davidwendt)
  - Bradley Dice (https://github.com/bdice)

URL: #15213
@vyasr vyasr mentioned this pull request Aug 22, 2024
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Python] accept pyarrow values / scalars in constructor functions ?
3 participants