Skip to content

Commit

Permalink
[Python][Docs] Document the Arrow PyCapsule protocol in the 'extendin…
Browse files Browse the repository at this point in the history
…g pyarrow' section of the Python docs
  • Loading branch information
jorisvandenbossche committed Dec 12, 2023
1 parent 087fc8f commit f78084d
Show file tree
Hide file tree
Showing 2 changed files with 34 additions and 0 deletions.
2 changes: 2 additions & 0 deletions docs/source/format/CDataInterface/PyCapsuleInterface.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@
.. under the License.
.. _arrow-pycapsule-interface:

=============================
The Arrow PyCapsule Interface
=============================
Expand Down
32 changes: 32 additions & 0 deletions docs/source/python/extending_types.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,38 @@
Extending pyarrow
=================

Controlling conversion to (Py)Arrow with the PyCapsule Interface
----------------------------------------------------------------

The :ref:`Arrow C data interface <c-data-interface>` allows moving Arrow data between
different implementations of Arrow. This is a generic, cross-language interface not
specific to Python, but for Python libraries this interface is extended with a Python
specific layer: :ref:`arrow-pycapsule-interface`.

This Python interface ensures that different libraries that support the C Data interface
can recognize each other objects and export Arrow data structures in a standard way.

If you have a library providing data structures that hold Arrow-compatible data
under the hood, you can implement the following dunder methods on those objects:

- ``__arrow_c_schema__`` for schema or type-like objects.
- ``__arrow_c_array__`` for arrays and record batches (contiguous tables).
- ``__arrow_c_stream__`` for chunked tables or streams of data.

Those methods return `PyCapsule <https://docs.python.org/3/c-api/capsule.html>`__
objects, and more details on the exact semantics can be found in the
:ref:`specification <arrow-pycapsule-interface>`.

When your data structures have those dunder methods defined, the pyarrow constructors
(such as :func:`pyarrow.array` or :func:`pyarrow.table`) will recognize those objects as
supporting this protocol, and convert them to PyArrow data structures zero-copy. And the
same can be true for any other library supporting this protocol on ingesting data.

Similarly, if your library has functions that accept user-provided data, you can add
support for this protocol by checking for the presence of those dunder methods, and
therefore accept any Arrow data (instead of harcoding support for a specific
Arrow producer such as PyArrow).

.. _arrow_array_protocol:

Controlling conversion to pyarrow.Array with the ``__arrow_array__`` protocol
Expand Down

0 comments on commit f78084d

Please sign in to comment.