Skip to content

Commit

Permalink
add docs about how to consume objects through the protocol in pyarrow
Browse files Browse the repository at this point in the history
  • Loading branch information
jorisvandenbossche committed Mar 29, 2024
1 parent 3bb8fc4 commit fe0b542
Showing 1 changed file with 23 additions and 2 deletions.
25 changes: 23 additions & 2 deletions docs/source/python/extending_types.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,14 +37,14 @@ under the hood, you can implement the following methods on those objects:

- ``__arrow_c_schema__`` for schema or type-like objects.
- ``__arrow_c_array__`` for arrays and record batches (contiguous tables).
- ``__arrow_c_stream__`` for chunked tables or streams of data.
- ``__arrow_c_stream__`` for chunked arrays or tables, or streams of data.

Those methods return `PyCapsule <https://docs.python.org/3/c-api/capsule.html>`__
objects, and more details on the exact semantics can be found in the
:ref:`specification <arrow-pycapsule-interface>`.

When your data structures have those methods defined, the PyArrow constructors
(such as :func:`pyarrow.array` or :func:`pyarrow.table`) will recognize those objects as
(see below) will recognize those objects as
supporting this protocol, and convert them to PyArrow data structures zero-copy. And the
same can be true for any other library supporting this protocol on ingesting data.

Expand All @@ -53,6 +53,27 @@ support for this protocol by checking for the presence of those methods, and
therefore accept any Arrow data (instead of harcoding support for a specific
Arrow producer such as PyArrow).

For consuming data through this protocol with PyArrow, the following constructors
can be used to create the various PyArrow objects:

+----------------------------+-----------------------------------------------+--------------------+
| Result class | Mapped Arrow type | Supported protocol |
+============================+===============================================+====================+
| :class:`Array` | :func:`pyarrow.array` | array |
+----------------------------+-----------------------------------------------+--------------------+
| :class:`ChunkedArray` | :func:`pyarrow.chunked_array` | array, stream |
+----------------------------+-----------------------------------------------+--------------------+
| :class:`RecordBatch` | :func:`pyarrow.record_batch` | array |
+----------------------------+-----------------------------------------------+--------------------+
| :class:`Table` | :func:`pyarrow.table` | array, stream |
+----------------------------+-----------------------------------------------+--------------------+
| :class:`RecordBatchReader` | :meth:`pyarrow.RecordBatchReader.from_stream` | stream |
+----------------------------+-----------------------------------------------+--------------------+
| :class:`Field` | :func:`pyarrow.record_batch` | schema |
+----------------------------+-----------------------------------------------+--------------------+
| :class:`Schema` | :func:`pyarrow.record_batch` | schema |
+----------------------------+-----------------------------------------------+--------------------+

.. _arrow_array_protocol:

Controlling conversion to pyarrow.Array with the ``__arrow_array__`` protocol
Expand Down

0 comments on commit fe0b542

Please sign in to comment.