-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* draft updates * discuss array API standard * fix sparse examples so they run * Deepak's suggestions Co-authored-by: Deepak Cherian <[email protected]> * link to duck arrays user guide from internals page * fix various links * itemized list * mention dispatching on functions not in the array API standard * examples of duckarrays * add intended audience to xarray internals section * draft page on chunked arrays * move paragraph on why its called a duck array upwards * delete section on numpy ufuncs * explain difference between .values and to_numpy * strongly prefer to_numpy over values * recommend to_numpy instead of values in the how do I? page * clearer about using to_numpy * merge section on missing features * remove todense from examples * whatsnew * double that Co-authored-by: Deepak Cherian <[email protected]> * numpy array class clarification Co-authored-by: Deepak Cherian <[email protected]> * Remove sentence about xarray's internals Co-authored-by: Deepak Cherian <[email protected]> * array API standard Co-authored-by: Deepak Cherian <[email protected]> * proper link for sparse.COO type Co-authored-by: Deepak Cherian <[email protected]> * links to docstrings of array types Co-authored-by: Deepak Cherian <[email protected]> * don't put variable in parentheses Co-authored-by: Deepak Cherian <[email protected]> * double backquote formatting Co-authored-by: Deepak Cherian <[email protected]> * better bracketing Co-authored-by: Deepak Cherian <[email protected]> * fix list formatting * add links to glue packages, dask, and cubed * link to todense method Co-authored-by: Deepak Cherian <[email protected]> * link to numpy-like arrays page * link to numpy ufunc docs * more text about chunkmanagers * add example of using .to_numpy * note on ideally not having an entrypoint system * parallel processing without chunks * explain the user interface * how to register the chunkmanager * show example of .values failing * link from duck arrays page * whatsnew * move whatsnew entry to unreleased version * capitalization * fix warning in docs build * fix a bunch of links * display API of ChunkManagerEntrypoint class attributes and methods * improve docstrings in ABC * add cubed to intersphinx mapping * link to dask.array as module not class * typo * fix bold formatting * proper docstrings * mention from_array specifically and link to requirements section of duck array internals page * add explicit link to cubed * mention ramba and arkouda * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * py:mod * Present tense regarding wrapping cubed Co-authored-by: Deepak Cherian <[email protected]> * add links to cubed * add references for numpy links in apply_gufunc docstring * fix some broken links to docstrings --------- Co-authored-by: Deepak Cherian <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
- Loading branch information
1 parent
0de7761
commit 58096a6
Showing
7 changed files
with
489 additions
and
16 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,102 @@ | ||
.. currentmodule:: xarray | ||
|
||
.. _internals.chunkedarrays: | ||
|
||
Alternative chunked array types | ||
=============================== | ||
|
||
.. warning:: | ||
|
||
This is a *highly* experimental feature. Please report any bugs or other difficulties on `xarray's issue tracker <https://github.com/pydata/xarray/issues>`_. | ||
In particular see discussion on `xarray issue #6807 <https://github.com/pydata/xarray/issues/6807>`_ | ||
|
||
Xarray can wrap chunked dask arrays (see :ref:`dask`), but can also wrap any other chunked array type that exposes the correct interface. | ||
This allows us to support using other frameworks for distributed and out-of-core processing, with user code still written as xarray commands. | ||
In particular xarray also supports wrapping :py:class:`cubed.Array` objects | ||
(see `Cubed's documentation <https://tom-e-white.com/cubed/>`_ and the `cubed-xarray package <https://github.com/xarray-contrib/cubed-xarray>`_). | ||
|
||
The basic idea is that by wrapping an array that has an explicit notion of ``.chunks``, xarray can expose control over | ||
the choice of chunking scheme to users via methods like :py:meth:`DataArray.chunk` whilst the wrapped array actually | ||
implements the handling of processing all of the chunks. | ||
|
||
Chunked array methods and "core operations" | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
A chunked array needs to meet all the :ref:`requirements for normal duck arrays <internals.duckarrays.requirements>`, but must also | ||
implement additional features. | ||
|
||
Chunked arrays have additional attributes and methods, such as ``.chunks`` and ``.rechunk``. | ||
Furthermore, Xarray dispatches chunk-aware computations across one or more chunked arrays using special functions known | ||
as "core operations". Examples include ``map_blocks``, ``blockwise``, and ``apply_gufunc``. | ||
|
||
The core operations are generalizations of functions first implemented in :py:mod:`dask.array`. | ||
The implementation of these functions is specific to the type of arrays passed to them. For example, when applying the | ||
``map_blocks`` core operation, :py:class:`dask.array.Array` objects must be processed by :py:func:`dask.array.map_blocks`, | ||
whereas :py:class:`cubed.Array` objects must be processed by :py:func:`cubed.map_blocks`. | ||
|
||
In order to use the correct implementation of a core operation for the array type encountered, xarray dispatches to the | ||
corresponding subclass of :py:class:`~xarray.core.parallelcompat.ChunkManagerEntrypoint`, | ||
also known as a "Chunk Manager". Therefore **a full list of the operations that need to be defined is set by the | ||
API of the** :py:class:`~xarray.core.parallelcompat.ChunkManagerEntrypoint` **abstract base class**. Note that chunked array | ||
methods are also currently dispatched using this class. | ||
|
||
Chunked array creation is also handled by this class. As chunked array objects have a one-to-one correspondence with | ||
in-memory numpy arrays, it should be possible to create a chunked array from a numpy array by passing the desired | ||
chunking pattern to an implementation of :py:class:`~xarray.core.parallelcompat.ChunkManagerEntrypoint.from_array``. | ||
|
||
.. note:: | ||
|
||
The :py:class:`~xarray.core.parallelcompat.ChunkManagerEntrypoint` abstract base class is mostly just acting as a | ||
namespace for containing the chunked-aware function primitives. Ideally in the future we would have an API standard | ||
for chunked array types which codified this structure, making the entrypoint system unnecessary. | ||
|
||
.. currentmodule:: xarray.core.parallelcompat | ||
|
||
.. autoclass:: xarray.core.parallelcompat.ChunkManagerEntrypoint | ||
:members: | ||
|
||
Registering a new ChunkManagerEntrypoint subclass | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
Rather than hard-coding various chunk managers to deal with specific chunked array implementations, xarray uses an | ||
entrypoint system to allow developers of new chunked array implementations to register their corresponding subclass of | ||
:py:class:`~xarray.core.parallelcompat.ChunkManagerEntrypoint`. | ||
|
||
|
||
To register a new entrypoint you need to add an entry to the ``setup.cfg`` like this:: | ||
|
||
[options.entry_points] | ||
xarray.chunkmanagers = | ||
dask = xarray.core.daskmanager:DaskManager | ||
|
||
See also `cubed-xarray <https://github.com/xarray-contrib/cubed-xarray>`_ for another example. | ||
|
||
To check that the entrypoint has worked correctly, you may find it useful to display the available chunkmanagers using | ||
the internal function :py:func:`~xarray.core.parallelcompat.list_chunkmanagers`. | ||
|
||
.. autofunction:: list_chunkmanagers | ||
|
||
|
||
User interface | ||
~~~~~~~~~~~~~~ | ||
|
||
Once the chunkmanager subclass has been registered, xarray objects wrapping the desired array type can be created in 3 ways: | ||
|
||
#. By manually passing the array type to the :py:class:`~xarray.DataArray` constructor, see the examples for :ref:`numpy-like arrays <userguide.duckarrays>`, | ||
|
||
#. Calling :py:meth:`~xarray.DataArray.chunk`, passing the keyword arguments ``chunked_array_type`` and ``from_array_kwargs``, | ||
|
||
#. Calling :py:func:`~xarray.open_dataset`, passing the keyword arguments ``chunked_array_type`` and ``from_array_kwargs``. | ||
|
||
The latter two methods ultimately call the chunkmanager's implementation of ``.from_array``, to which they pass the ``from_array_kwargs`` dict. | ||
The ``chunked_array_type`` kwarg selects which registered chunkmanager subclass to dispatch to. It defaults to ``'dask'`` | ||
if Dask is installed, otherwise it defaults to whichever chunkmanager is registered if only one is registered. | ||
If multiple chunkmanagers are registered it will raise an error by default. | ||
|
||
Parallel processing without chunks | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
To use a parallel array type that does not expose a concept of chunks explicitly, none of the information on this page | ||
is theoretically required. Such an array type (e.g. `Ramba <https://github.com/Python-for-HPC/ramba>`_ or | ||
`Arkouda <https://github.com/Bears-R-Us/arkouda>`_) could be wrapped using xarray's existing support for | ||
:ref:`numpy-like "duck" arrays <userguide.duckarrays>`. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.