-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(python): Support the PyCapsule protocol #318
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #318 +/- ##
==========================================
- Coverage 87.01% 85.29% -1.73%
==========================================
Files 70 3 -67
Lines 10574 374 -10200
==========================================
- Hits 9201 319 -8882
+ Misses 1373 55 -1318 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this! I think you probably want to use the Schema
, Array
, and ArrayStream
constructors with the capsule as the base
instead of "moving" into the "holder" helpers. We might be able to get rid of the holders entirely now that the capsules are defined (once there's code for allocating them).
if hasattr(obj, "__arrow_c_schema__"): | ||
return Schema._import_from_c_capsule(obj.__arrow_c_schema__()) | ||
|
||
# for pyarrow < 14.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At some point we should maybe remove this logic (or do stricter check...maybe using type().__name__
or something). A name-based check would support older pyarrow, anyway...I think it would be reasonable for any other package that were using _export_to_c()
to update and use the new dunder method to be compatible here.
return self.schema.__arrow_c_schema__() | ||
|
||
|
||
class ArrayWrapper: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Array(Stream)Wrapper
should also implement __arrow_c_schema__
, right? Even if not used here, it might be good for future readers who end up here to see the example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, "Will do exporting next."
python/src/nanoarrow/_lib.pyx
Outdated
|
||
c_schema = <ArrowSchema*> PyCapsule_GetPointer(schema_capsule, 'arrow_schema') | ||
|
||
out = Schema.allocate() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if what you want here is more like:
cdef Schema schema = Schema(schema_capsule, <uintptr_t>PyCapsule_GetPointer(schema_capsule, 'arrow_schema'))
(i.e., no need to "move" here...base
can be a capsule, which does basically the same thing as the SchemaHolder
)
That's fine for me as well, no strong opinion here. I actually started that way, but thought that moving the struct as consumer so we own the memory would be considered good practice (nanoarrow is of course a bit of a peculiar consumer). But actually for nested ones that still only moves the top-level one, so it's not that this guarantees anything .. |
Indeed, the PyCapsule also ensures proper deallocation like the Holder class, so will try a variant with keeping the capsule around as the |
Moving the struct is definitely good practice, but the Capsule interface forces that on the producer (whereas with the |
For exporting (the dunder methods), do we want to move in that case? I think right now we don't really expose the functionality to create a Schema/Array .. object from "scratch", owning the memory, but in theory nanoarrow C supports that, so we could eventually have that. For such a case, we should probably create a new capsule with a moved struct? Or even in general, if you want that a nanoarrow.Array can be consumed multiple times, we need to export a new struct, otherwise the one of the consumers could already mark it as released. |
That is a very complicated question, although I think the renaming of classes will help here. For the current For Schema, you can use Streams I think you can just move...they are already stateful and so I think there is less opportunity for somebody to misuse them in that way. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few details!
python/src/nanoarrow/_lib.pyx
Outdated
from cpython cimport Py_buffer | ||
from nanoarrow_c cimport * | ||
from libc.string cimport memcpy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be removed now?
@@ -428,6 +447,33 @@ cdef class Array: | |||
self._ptr = <ArrowArray*>addr | |||
self._schema = schema | |||
|
|||
@staticmethod | |||
def _import_from_c_capsule(schema_capsule, array_capsule): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the ordering everywhere else is array
, schema
?
Follow-up on #318 Exports the different objects using different mechanism: - ArrowSchema -> deep copy - ArrowArray -> shallow copy of the struct, update private_data/release to ref count the original's base - ArrowArrayStream -> move the struct (turning the source nanoarrow.ArrayStream object as released, but that's fine)
First commit with just importing for now. Will do exporting next.