Skip to content

Commit

Permalink
Remove unvetted DataTree methods (pydata#9585)
Browse files Browse the repository at this point in the history
* Remove unvetted DataTree methods

As
[discussed](https://docs.google.com/presentation/d/1zBjEsihBhK_U972jxHwaAZBbzS1-hd3aDLnO9uu2Ob4/edit#slide=id.g3087b787633_13_0)
in the last DataTree meeting, this PR deletes the many Dataset methods
that were copied onto DataTree without unit tests, along with a few that
are not implemented properly yet, e.g.,

1. Arithmetic methods were removed, because `DataTree + Dataset` should
   probably raise an error.
2. Indexing and aggregation methods were removed, because these should
   allow for dimensions that are missing only on some nodes.
3. The untested `map_over_subtree_inplace` and `render` methods were
   removed.
3. A few other methods (e.g., `merge` and `plot`) that were only
   implemented by raising `NotImplementedError`` are entirely removed
   instead.

* groups docstring

* comment out removed DataTree methods

* update quick overview on DataTree

* doc fixes suggested by Tom
  • Loading branch information
shoyer authored Oct 7, 2024
1 parent 64e60cf commit 1d7c0f6
Show file tree
Hide file tree
Showing 4 changed files with 190 additions and 228 deletions.
297 changes: 149 additions & 148 deletions doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -705,16 +705,16 @@ Pathlib-like Interface
DataTree.parents
DataTree.relative_to

Missing:
.. Missing:
..
.. ..
``DataTree.glob``
``DataTree.joinpath``
``DataTree.with_name``
``DataTree.walk``
``DataTree.rename``
``DataTree.replace``
.. ``DataTree.glob``
.. ``DataTree.joinpath``
.. ``DataTree.with_name``
.. ``DataTree.walk``
.. ``DataTree.rename``
.. ``DataTree.replace``
DataTree Contents
-----------------
Expand All @@ -725,17 +725,18 @@ Manipulate the contents of all nodes in a ``DataTree`` simultaneously.
:toctree: generated/

DataTree.copy
DataTree.assign_coords
DataTree.merge
DataTree.rename
DataTree.rename_vars
DataTree.rename_dims
DataTree.swap_dims
DataTree.expand_dims
DataTree.drop_vars
DataTree.drop_dims
DataTree.set_coords
DataTree.reset_coords

.. DataTree.assign_coords
.. DataTree.merge
.. DataTree.rename
.. DataTree.rename_vars
.. DataTree.rename_dims
.. DataTree.swap_dims
.. DataTree.expand_dims
.. DataTree.drop_vars
.. DataTree.drop_dims
.. DataTree.set_coords
.. DataTree.reset_coords
DataTree Node Contents
----------------------
Expand All @@ -760,129 +761,129 @@ Compare one ``DataTree`` object to another.
DataTree.equals
DataTree.identical

Indexing
--------

Index into all nodes in the subtree simultaneously.

.. autosummary::
:toctree: generated/

DataTree.isel
DataTree.sel
DataTree.drop_sel
DataTree.drop_isel
DataTree.head
DataTree.tail
DataTree.thin
DataTree.squeeze
DataTree.interp
DataTree.interp_like
DataTree.reindex
DataTree.reindex_like
DataTree.set_index
DataTree.reset_index
DataTree.reorder_levels
DataTree.query

..
Missing:
``DataTree.loc``


Missing Value Handling
----------------------

.. autosummary::
:toctree: generated/

DataTree.isnull
DataTree.notnull
DataTree.combine_first
DataTree.dropna
DataTree.fillna
DataTree.ffill
DataTree.bfill
DataTree.interpolate_na
DataTree.where
DataTree.isin

Computation
-----------

Apply a computation to the data in all nodes in the subtree simultaneously.

.. autosummary::
:toctree: generated/

DataTree.map
DataTree.reduce
DataTree.diff
DataTree.quantile
DataTree.differentiate
DataTree.integrate
DataTree.map_blocks
DataTree.polyfit
DataTree.curvefit

Aggregation
-----------

Aggregate data in all nodes in the subtree simultaneously.

.. autosummary::
:toctree: generated/

DataTree.all
DataTree.any
DataTree.argmax
DataTree.argmin
DataTree.idxmax
DataTree.idxmin
DataTree.max
DataTree.min
DataTree.mean
DataTree.median
DataTree.prod
DataTree.sum
DataTree.std
DataTree.var
DataTree.cumsum
DataTree.cumprod

ndarray methods
---------------

Methods copied from :py:class:`numpy.ndarray` objects, here applying to the data in all nodes in the subtree.

.. autosummary::
:toctree: generated/

DataTree.argsort
DataTree.astype
DataTree.clip
DataTree.conj
DataTree.conjugate
DataTree.round
DataTree.rank

Reshaping and reorganising
--------------------------

Reshape or reorganise the data in all nodes in the subtree.

.. autosummary::
:toctree: generated/

DataTree.transpose
DataTree.stack
DataTree.unstack
DataTree.shift
DataTree.roll
DataTree.pad
DataTree.sortby
DataTree.broadcast_like
.. Indexing
.. --------
.. Index into all nodes in the subtree simultaneously.
.. .. autosummary::
.. :toctree: generated/
.. DataTree.isel
.. DataTree.sel
.. DataTree.drop_sel
.. DataTree.drop_isel
.. DataTree.head
.. DataTree.tail
.. DataTree.thin
.. DataTree.squeeze
.. DataTree.interp
.. DataTree.interp_like
.. DataTree.reindex
.. DataTree.reindex_like
.. DataTree.set_index
.. DataTree.reset_index
.. DataTree.reorder_levels
.. DataTree.query
.. ..
.. Missing:
.. ``DataTree.loc``
.. Missing Value Handling
.. ----------------------
.. .. autosummary::
.. :toctree: generated/
.. DataTree.isnull
.. DataTree.notnull
.. DataTree.combine_first
.. DataTree.dropna
.. DataTree.fillna
.. DataTree.ffill
.. DataTree.bfill
.. DataTree.interpolate_na
.. DataTree.where
.. DataTree.isin
.. Computation
.. -----------
.. Apply a computation to the data in all nodes in the subtree simultaneously.
.. .. autosummary::
.. :toctree: generated/
.. DataTree.map
.. DataTree.reduce
.. DataTree.diff
.. DataTree.quantile
.. DataTree.differentiate
.. DataTree.integrate
.. DataTree.map_blocks
.. DataTree.polyfit
.. DataTree.curvefit
.. Aggregation
.. -----------
.. Aggregate data in all nodes in the subtree simultaneously.
.. .. autosummary::
.. :toctree: generated/
.. DataTree.all
.. DataTree.any
.. DataTree.argmax
.. DataTree.argmin
.. DataTree.idxmax
.. DataTree.idxmin
.. DataTree.max
.. DataTree.min
.. DataTree.mean
.. DataTree.median
.. DataTree.prod
.. DataTree.sum
.. DataTree.std
.. DataTree.var
.. DataTree.cumsum
.. DataTree.cumprod
.. ndarray methods
.. ---------------
.. Methods copied from :py:class:`numpy.ndarray` objects, here applying to the data in all nodes in the subtree.
.. .. autosummary::
.. :toctree: generated/
.. DataTree.argsort
.. DataTree.astype
.. DataTree.clip
.. DataTree.conj
.. DataTree.conjugate
.. DataTree.round
.. DataTree.rank
.. Reshaping and reorganising
.. --------------------------
.. Reshape or reorganise the data in all nodes in the subtree.
.. .. autosummary::
.. :toctree: generated/
.. DataTree.transpose
.. DataTree.stack
.. DataTree.unstack
.. DataTree.shift
.. DataTree.roll
.. DataTree.pad
.. DataTree.sortby
.. DataTree.broadcast_like
IO / Conversion
===============
Expand Down Expand Up @@ -961,10 +962,10 @@ DataTree methods
DataTree.to_netcdf
DataTree.to_zarr

..
.. ..
Missing:
``open_mfdatatree``
.. Missing:
.. ``open_mfdatatree``
Coordinates objects
===================
Expand Down Expand Up @@ -1476,10 +1477,10 @@ Advanced API
backends.list_engines
backends.refresh_engines

..
.. ..
Missing:
``DataTree.set_close``
.. Missing:
.. ``DataTree.set_close``
Default, pandas-backed indexes built-in Xarray:

Expand Down
26 changes: 16 additions & 10 deletions doc/getting-started-guide/quick-overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -314,23 +314,29 @@ And you can get a copy of just the node local values of :py:class:`~xarray.Datas
ds_node_local = dt["simulation/coarse"].to_dataset(inherited=False)
ds_node_local
Operations map over subtrees, so we can take a mean over the ``x`` dimension of both the ``fine`` and ``coarse`` groups just by:
.. note::

.. ipython:: python
We intend to eventually implement most :py:class:`~xarray.Dataset` methods
(indexing, aggregation, arithmetic, etc) on :py:class:`~xarray.DataTree`
objects, but many methods have not been implemented yet.

avg = dt["simulation"].mean(dim="x")
avg
.. Operations map over subtrees, so we can take a mean over the ``x`` dimension of both the ``fine`` and ``coarse`` groups just by:
Here the ``"x"`` dimension used is always the one local to that subgroup.
.. .. ipython:: python
.. avg = dt["simulation"].mean(dim="x")
.. avg
You can do almost everything you can do with :py:class:`~xarray.Dataset` objects with :py:class:`~xarray.DataTree` objects
(including indexing and arithmetic), as operations will be mapped over every subgroup in the tree.
This allows you to work with multiple groups of non-alignable variables at once.
.. Here the ``"x"`` dimension used is always the one local to that subgroup.
.. note::
If all of your variables are mutually alignable (i.e. they live on the same
.. You can do almost everything you can do with :py:class:`~xarray.Dataset` objects with :py:class:`~xarray.DataTree` objects
.. (including indexing and arithmetic), as operations will be mapped over every subgroup in the tree.
.. This allows you to work with multiple groups of non-alignable variables at once.
.. tip::

If all of your variables are mutually alignable (i.e., they live on the same
grid, such that every common dimension name maps to the same length), then
you probably don't need :py:class:`xarray.DataTree`, and should consider
just sticking with :py:class:`xarray.Dataset`.
Loading

0 comments on commit 1d7c0f6

Please sign in to comment.