Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Rewrite pyarrow.jvm using the C data interface #44860

Open
asfimport opened this issue Oct 14, 2021 · 16 comments
Open

[Python] Rewrite pyarrow.jvm using the C data interface #44860

asfimport opened this issue Oct 14, 2021 · 16 comments

Comments

@asfimport
Copy link
Collaborator

asfimport commented Oct 14, 2021

The pyarrow.jvm is currently a custom-written bridge between PyArrow and Arrow Java, with limited datatype support. Now that Java implements the C data interface (see ARROW-12965), we should be able to simplify the code while making it more general.

Also, we should reenable the conda-python-jpype build somewhere, for example in the Crossbow nightly builds.

Reporter: Antoine Pitrou / @pitrou

Related issues:

Note: This issue was originally created as ARROW-14319. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

Roee Shlomo / @roee88:
I assume that backward compatibility is not required for internal use methods (i.e., starting with an underscore). What about jvm_buffer, should it just be kept as is?

@asfimport
Copy link
Collaborator Author

Antoine Pitrou / @pitrou:
jvm_buffer should probably be kept, yes. We may also want to deprecate it (it's not obvious it's useful in isolation).

As for your other question: indeed, methods starting with an underscore do not enter into backward compatibility concerns.

@asfimport
Copy link
Collaborator Author

Roee Shlomo / @roee88:
I suspect that a better approach would be to create a new module and keep pyarrow.jvm as is:

  1. Backward compatibility seems like a challenge. There must be a reference provided to org.apache.arrow.c so ArrowSchema, ArrowArray and the various import/export functions would be available on the python side. In addition, all C data interface methods require an allocator as a parameter. These are not provided in the current pyarrow.jvm API. 
  2. The current pyarrow.jvm module works with a pure java build of Arrow Java, while the C data interface requires building a small JNI library. Unless you rely on end users to build the Java jar on their own, packaging the JNI lib will be required for all platforms targeted by pyarrow.

@asfimport
Copy link
Collaborator Author

Antoine Pitrou / @pitrou:
cc @xhochy

@asfimport
Copy link
Collaborator Author

Roee Shlomo / @roee88:
@pitrou  feel free to reuse code from my attempt the other day https://gist.github.com/roee88/4aa7dfeceb2d8c3d8868ed8465ebf561 if that helps. It's based on the java-python integration tests code for ARROW-14374 (with the original test_jvm.py tests updated).

@asfimport
Copy link
Collaborator Author

Antoine Pitrou / @pitrou:
@amol- ^^

@asfimport
Copy link
Collaborator Author

Todd Farmer / @toddfarmer:
This issue was last updated over 90 days ago, which may be an indication it is no longer being actively worked. To better reflect the current state, the issue is being unassigned per project policy. Please feel free to re-take assignment of the issue if it is being actively worked, or if you plan to start that work soon.

@vibhatha
Copy link
Collaborator

@jorisvandenbossche is there an ongoing effort to integrate C Data interface to pyarrow.jvm?

@jorisvandenbossche
Copy link
Member

I am not aware of someone actually working on this, except for this issue tracking that we should at some point do that.

@vibhatha
Copy link
Collaborator

Would it be okay if I work on this?

@jorisvandenbossche
Copy link
Member

Certainly!

@vibhatha
Copy link
Collaborator

take

@vibhatha vibhatha removed their assignment Jun 2, 2024
@vibhatha
Copy link
Collaborator

vibhatha commented Jun 2, 2024

@jorisvandenbossche I am removing my assignment since focus has been changed and I couldn't attend to this issue timely.

@assignUser assignUser transferred this issue from apache/arrow Nov 26, 2024
@pitrou
Copy link
Member

pitrou commented Nov 26, 2024

Uh, oh. This is a PyArrow issue even though it also pertains to the Java implementation (as in: calls into Arrow Java APIs). @assignUser

@kou kou transferred this issue from apache/arrow-java Nov 27, 2024
@kou
Copy link
Member

kou commented Nov 27, 2024

I re-transferred to apache/arrow.

@assignUser
Copy link
Member

Sorry ^^

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants