feat(python/adbc_driver_manager): export handles and ingest data through python Arrow PyCapsule interface #1346

jorisvandenbossche · 2023-12-06T15:50:40Z

Addresses #70

This PR adds the dunder methods to the Handle classes of the low-level interface (which already enables using the low-level interface without pyarrow and with the capsule protocol).

And secondly, in the places that accept data (eg ingest/bind), it now also accepts objects that implement the dunders in addition to hardcoded support for pyarrow.

jorisvandenbossche · 2023-12-06T15:55:12Z

python/adbc_driver_manager/adbc_driver_manager/_lib.pyx

+        memcpy(allocated, &self.schema, sizeof(CArrowSchema))
+        self.schema.release = NULL
+        return capsule


We are "moving" the schema here, while in nanoarrow I opted for a hard copy for the schema (using nanoarrow's ArrowSchemaDeepCopy).

But I think the only advantage of a hard copy is that this means you can consume it multiple times? (or in the case of nanoarrow-python, that the nanoarrow Schema object is still valid and inspectable after it has been converted to eg a pyarrow.Schema)
For ADBC, I think the use case will be much more "receive handle and convert it directly once", given that the Handle object itself isn't useful at all (in contrast to nanoarrow.Schema), so moving here is probably fine?

I think moving makes sense here.

jorisvandenbossche · 2023-12-06T16:02:38Z

python/adbc_driver_manager/adbc_driver_manager/_lib.pyx

+    def __arrow_c_array__(self, requested_schema=None) -> object:
+        """Consume this object to get a PyCapsule."""
+        if requested_schema is not None:
+            raise NotImplementedError("requested_schema")
+
+        cdef CArrowArray* allocated = <CArrowArray*> malloc(sizeof(CArrowArray))
+        allocated.release = NULL
+        capsule = PyCapsule_New(
+            <void*>allocated, "arrow_array", pycapsule_array_deleter,
+        )
+        memcpy(allocated, &self.array, sizeof(CArrowArray))
+        self.array.release = NULL
+        return capsule


This is actually not being used at the moment, because I think none of the ADBC APIs are returning an ArrowArray (only ArrowSchema or ArrowArrayStream).
This handle is currently only used internally for ingesting data (bind).

So I could also remove __arrow_c_array__, given it is unused. If we require pyarrow >= 14, I could probably also remove this class entirely, because then we can use the capsule interface for ingesting data.

And realizing now, this implementation is actually also wrong -> it needs to return two capsules, one for the ArrowArray but also one for the ArrowSchema. And this handle only has the array. So I don't think we can add this dunder here.

jorisvandenbossche · 2023-12-06T16:03:35Z

python/adbc_driver_manager/pyproject.toml

@@ -25,8 +25,8 @@ requires-python = ">=3.9"
 dynamic = ["version"]

 [project.optional-dependencies]
-dbapi = ["pandas", "pyarrow>=8.0.0"]
-test = ["duckdb", "pandas", "pyarrow>=8.0.0", "pytest"]
+dbapi = ["pandas", "pyarrow>=14.0.1"]


Are we OK with bumping this requirement? (I don't know who are already users of the python adbc packages that might be affected)

I figured we should bump it just because our official guidance is to upgrade.

Not a huge deal now but if usage grows over time this could be a pain point for pandas.

We can remove it. I'm just not sure if we can express that you need the fix package if you're < 14.0.1 in the requirements

I reverted this change to update the minimum requirement. This PR doesn't strictly speaking need it, so let's keep the discussion to bump the minimum requirement separate.

jorisvandenbossche · 2023-12-06T16:05:24Z

python/adbc_driver_manager/tests/test_lowlevel.py

+
+
+@pytest.mark.sqlite
+def test_pycapsule(sqlite):


I further expanded the specific test that David started, but in addition (at least if requiring pyarrow>=14 for testing) I could also update the other tests above to do the import/export using capsules instead of the current calls to _import_from_c), that then will also automatically give some more test coverage.

lidavidm

Thanks! Just one small question

lidavidm · 2023-12-07T18:54:53Z

python/adbc_driver_manager/pyproject.toml

@@ -25,8 +25,8 @@ requires-python = ">=3.9"
 dynamic = ["version"]

 [project.optional-dependencies]
-dbapi = ["pandas", "pyarrow>=8.0.0"]
-test = ["duckdb", "pandas", "pyarrow>=8.0.0", "pytest"]
+dbapi = ["pandas", "pyarrow>=14.0.1"]


I figured we should bump it just because our official guidance is to upgrade.

lidavidm · 2023-12-08T19:35:15Z

python/adbc_driver_manager/adbc_driver_manager/_lib.pyx

+        memcpy(allocated, &self.schema, sizeof(CArrowSchema))
+        self.schema.release = NULL
+        return capsule


I think moving makes sense here.

python/adbc_driver_manager/adbc_driver_manager/_lib.pyx

…lasses)

lidavidm

Thanks!

lidavidm and others added 3 commits December 6, 2023 14:45

WIP

df501e1

small clean-up + fix dunder name + some tests

a5878b7

expand test

07da02c

github-actions bot added this to the ADBC Libraries 0.9.0 milestone Dec 6, 2023

jorisvandenbossche commented Dec 6, 2023

View reviewed changes

jorisvandenbossche added 2 commits December 7, 2023 15:06

ingest data supporting the Arrow PyCapsule protocol

e0fdba2

lint

3856a6c

jorisvandenbossche force-pushed the capsules branch from 8d95ec8 to 3856a6c Compare December 7, 2023 14:15

jorisvandenbossche marked this pull request as ready for review December 7, 2023 21:19

jorisvandenbossche requested a review from lidavidm as a code owner December 7, 2023 21:19

lidavidm approved these changes Dec 8, 2023

View reviewed changes

jorisvandenbossche added 5 commits December 13, 2023 14:23

undo version bump

40ea168

remove ArrowArrayHandle.__arrow_c_array__

30d4324

accept objects that implement the protocol in the lowlevel api

6f51825

clean-up unused array capsule helper

acf591f

only call dunder if not a handle class (avoid moving for the handle c…

6564ab1

…lasses)

lidavidm approved these changes Dec 13, 2023

View reviewed changes

lidavidm merged commit 9544887 into apache:main Dec 13, 2023
48 of 49 checks passed

jorisvandenbossche deleted the capsules branch December 13, 2023 20:25

jorisvandenbossche changed the title ~~feat(python/adbc_driver_manager): export handles through python Arrow Capsule interface~~ feat(python/adbc_driver_manager): export handles and ingest data through python Arrow Capsule interface Jan 19, 2024

jorisvandenbossche changed the title ~~feat(python/adbc_driver_manager): export handles and ingest data through python Arrow Capsule interface~~ feat(python/adbc_driver_manager): export handles and ingest data through python Arrow PyCapsule interface Jan 19, 2024

jorisvandenbossche mentioned this pull request Jan 19, 2024

[Python] Promote usage of the Arrow PyCapsule Protocol (for the C Data Inteface) apache/arrow#39195

Open

8 tasks

lidavidm mentioned this pull request Jan 24, 2024

python/adbc_driver_manager: use PyCapsule for handles to C structs #70

Closed

lidavidm mentioned this pull request Feb 29, 2024

feat(python/adbc_driver_manager): experiment with using PyCapsules #702

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(python/adbc_driver_manager): export handles and ingest data through python Arrow PyCapsule interface #1346

feat(python/adbc_driver_manager): export handles and ingest data through python Arrow PyCapsule interface #1346

jorisvandenbossche commented Dec 6, 2023 •

edited

Loading

jorisvandenbossche Dec 6, 2023

lidavidm Dec 8, 2023

jorisvandenbossche Dec 6, 2023

jorisvandenbossche Dec 13, 2023

jorisvandenbossche Dec 6, 2023

lidavidm Dec 7, 2023

WillAyd Dec 11, 2023

lidavidm Dec 11, 2023

jorisvandenbossche Dec 13, 2023

jorisvandenbossche Dec 6, 2023

lidavidm left a comment

lidavidm Dec 7, 2023

lidavidm Dec 8, 2023

lidavidm left a comment

feat(python/adbc_driver_manager): export handles and ingest data through python Arrow PyCapsule interface #1346

feat(python/adbc_driver_manager): export handles and ingest data through python Arrow PyCapsule interface #1346

Conversation

jorisvandenbossche commented Dec 6, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lidavidm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lidavidm left a comment

Choose a reason for hiding this comment

jorisvandenbossche commented Dec 6, 2023 •

edited

Loading