Skip to content

Commit

Permalink
Merge branch 'main' into feature/astropy-units-support
Browse files Browse the repository at this point in the history
  • Loading branch information
keewis authored Nov 5, 2024
2 parents 37d3510 + 0384363 commit 408a023
Show file tree
Hide file tree
Showing 20 changed files with 598 additions and 152 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/pypi-release.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ jobs:
path: dist
- name: Publish package to TestPyPI
if: github.event_name == 'push'
uses: pypa/gh-action-pypi-publish@v1.10.3
uses: pypa/gh-action-pypi-publish@v1.11.0
with:
repository_url: https://test.pypi.org/legacy/
verbose: true
Expand All @@ -111,6 +111,6 @@ jobs:
name: releases
path: dist
- name: Publish package to PyPI
uses: pypa/gh-action-pypi-publish@v1.10.3
uses: pypa/gh-action-pypi-publish@v1.11.0
with:
verbose: true
4 changes: 2 additions & 2 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ repos:
- id: mixed-line-ending
- repo: https://github.com/astral-sh/ruff-pre-commit
# Ruff version.
rev: 'v0.6.9'
rev: 'v0.7.2'
hooks:
- id: ruff-format
- id: ruff
Expand All @@ -25,7 +25,7 @@ repos:
exclude: "generate_aggregations.py"
additional_dependencies: ["black==24.8.0"]
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.11.2
rev: v1.13.0
hooks:
- id: mypy
# Copied from setup.cfg
Expand Down
8 changes: 8 additions & 0 deletions ci/requirements/environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,14 @@ dependencies:
- pre-commit
- pyarrow # pandas raises a deprecation warning without this, breaking doctests
- pydap
# start pydap server dependencies, can be removed if pydap-server is available
- gunicorn
- PasteDeploy
- docopt-ng
- Webob
- Jinja2
- beautifulsoup4
# end pydap server dependencies
- pytest
- pytest-cov
- pytest-env
Expand Down
3 changes: 3 additions & 0 deletions doc/getting-started-guide/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,9 @@ for conflicts between ``attrs`` when combining arrays and datasets, unless
explicitly requested with the option ``compat='identical'``. The guiding
principle is that metadata should not be allowed to get in the way.

In general xarray uses the capabilities of the backends for reading and writing
attributes. That has some implications on roundtripping. One example for such inconsistency is that size-1 lists will roundtrip as single element (for netcdf4 backends).

What other netCDF related Python libraries should I know about?
---------------------------------------------------------------

Expand Down
3 changes: 2 additions & 1 deletion doc/user-guide/data-structures.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,8 @@ alignment, building on the functionality of the ``index`` found on a pandas
DataArray objects also can have a ``name`` and can hold arbitrary metadata in
the form of their ``attrs`` property. Names and attributes are strictly for
users and user-written code: xarray makes no attempt to interpret them, and
propagates them only in unambiguous cases
propagates them only in unambiguous cases. For reading and writing attributes
xarray relies on the capabilities of the supported backends.
(see FAQ, :ref:`approach to metadata`).

.. _creating a dataarray:
Expand Down
9 changes: 9 additions & 0 deletions doc/user-guide/groupby.rst
Original file line number Diff line number Diff line change
Expand Up @@ -294,6 +294,15 @@ is identical to
ds.resample(time=TimeResampler("ME"))
The :py:class:`groupers.UniqueGrouper` accepts an optional ``labels`` kwarg that is not present
in :py:meth:`DataArray.groupby` or :py:meth:`Dataset.groupby`.
Specifying ``labels`` is required when grouping by a lazy array type (e.g. dask or cubed).
The ``labels`` are used to construct the output coordinate (say for a reduction), and aggregations
will only be run over the specified labels.
You may use ``labels`` to also specify the ordering of groups to be used during iteration.
The order will be preserved in the output.


.. _groupby.multiple:

Grouping by multiple variables
Expand Down
29 changes: 19 additions & 10 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,26 +23,43 @@ New Features
~~~~~~~~~~~~
- Added :py:meth:`DataTree.persist` method (:issue:`9675`, :pull:`9682`).
By `Sam Levang <https://github.com/slevang>`_.
- Added ``write_inherited_coords`` option to :py:meth:`DataTree.to_netcdf`
and :py:meth:`DataTree.to_zarr` (:pull:`9677`).
By `Stephan Hoyer <https://github.com/shoyer>`_.
- Support lazy grouping by dask arrays, and allow specifying ordered groups with ``UniqueGrouper(labels=["a", "b", "c"])``
(:issue:`2852`, :issue:`757`).
By `Deepak Cherian <https://github.com/dcherian>`_.

Breaking changes
~~~~~~~~~~~~~~~~


Deprecations
~~~~~~~~~~~~

- Grouping by a chunked array (e.g. dask or cubed) currently eagerly loads that variable in to
memory. This behaviour is deprecated. If eager loading was intended, please load such arrays
manually using ``.load()`` or ``.compute()``. Else pass ``eagerly_compute_group=False``, and
provide expected group labels using the ``labels`` kwarg to a grouper object such as
:py:class:`grouper.UniqueGrouper` or :py:class:`grouper.BinGrouper`.

Bug fixes
~~~~~~~~~

- Fix inadvertent deep-copying of child data in DataTree.
- Fix inadvertent deep-copying of child data in DataTree (:issue:`9683`,
:pull:`9684`).
By `Stephan Hoyer <https://github.com/shoyer>`_.
- Avoid including parent groups when writing DataTree subgroups to Zarr or
netCDF (:pull:`9682`).
By `Stephan Hoyer <https://github.com/shoyer>`_.
- Fix regression in the interoperability of :py:meth:`DataArray.polyfit` and :py:meth:`xr.polyval` for date-time coordinates. (:pull:`9691`).
By `Pascal Bourgault <https://github.com/aulemahal>`_.

Documentation
~~~~~~~~~~~~~

- Mention attribute peculiarities in docs/docstrings (:issue:`4798`, :pull:`9700`).
By `Kai Mühlbauer <https://github.com/kmuehlbauer>`_.


Internal Changes
~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -91,14 +108,6 @@ New Features
(:issue:`9427`, :pull: `9428`).
By `Alfonso Ladino <https://github.com/aladinor>`_.

Breaking changes
~~~~~~~~~~~~~~~~


Deprecations
~~~~~~~~~~~~


Bug fixes
~~~~~~~~~

Expand Down
2 changes: 1 addition & 1 deletion xarray/core/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -1094,7 +1094,7 @@ def _resample(
f"Received {type(freq)} instead."
)

rgrouper = ResolvedGrouper(grouper, group, self)
rgrouper = ResolvedGrouper(grouper, group, self, eagerly_compute_group=False)

return resample_cls(
self,
Expand Down
21 changes: 19 additions & 2 deletions xarray/core/dataarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -347,6 +347,7 @@ class DataArray(
attrs : dict_like or None, optional
Attributes to assign to the new instance. By default, an empty
attribute dictionary is initialized.
(see FAQ, :ref:`approach to metadata`)
indexes : py:class:`~xarray.Indexes` or dict-like, optional
For internal use only. For passing indexes objects to the
new DataArray, use the ``coords`` argument instead with a
Expand Down Expand Up @@ -6747,6 +6748,7 @@ def groupby(
*,
squeeze: Literal[False] = False,
restore_coord_dims: bool = False,
eagerly_compute_group: bool = True,
**groupers: Grouper,
) -> DataArrayGroupBy:
"""Returns a DataArrayGroupBy object for performing grouped operations.
Expand All @@ -6762,6 +6764,11 @@ def groupby(
restore_coord_dims : bool, default: False
If True, also restore the dimension order of multi-dimensional
coordinates.
eagerly_compute_group: bool
Whether to eagerly compute ``group`` when it is a chunked array.
This option is to maintain backwards compatibility. Set to False
to opt-in to future behaviour, where ``group`` is not automatically loaded
into memory.
**groupers : Mapping of str to Grouper or Resampler
Mapping of variable name to group by to :py:class:`Grouper` or :py:class:`Resampler` object.
One of ``group`` or ``groupers`` must be provided.
Expand Down Expand Up @@ -6876,7 +6883,9 @@ def groupby(
)

_validate_groupby_squeeze(squeeze)
rgroupers = _parse_group_and_groupers(self, group, groupers)
rgroupers = _parse_group_and_groupers(
self, group, groupers, eagerly_compute_group=eagerly_compute_group
)
return DataArrayGroupBy(self, rgroupers, restore_coord_dims=restore_coord_dims)

@_deprecate_positional_args("v2024.07.0")
Expand All @@ -6891,6 +6900,7 @@ def groupby_bins(
squeeze: Literal[False] = False,
restore_coord_dims: bool = False,
duplicates: Literal["raise", "drop"] = "raise",
eagerly_compute_group: bool = True,
) -> DataArrayGroupBy:
"""Returns a DataArrayGroupBy object for performing grouped operations.
Expand Down Expand Up @@ -6927,6 +6937,11 @@ def groupby_bins(
coordinates.
duplicates : {"raise", "drop"}, default: "raise"
If bin edges are not unique, raise ValueError or drop non-uniques.
eagerly_compute_group: bool
Whether to eagerly compute ``group`` when it is a chunked array.
This option is to maintain backwards compatibility. Set to False
to opt-in to future behaviour, where ``group`` is not automatically loaded
into memory.
Returns
-------
Expand Down Expand Up @@ -6964,7 +6979,9 @@ def groupby_bins(
precision=precision,
include_lowest=include_lowest,
)
rgrouper = ResolvedGrouper(grouper, group, self)
rgrouper = ResolvedGrouper(
grouper, group, self, eagerly_compute_group=eagerly_compute_group
)

return DataArrayGroupBy(
self,
Expand Down
21 changes: 19 additions & 2 deletions xarray/core/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -596,6 +596,7 @@ class Dataset(
attrs : dict-like, optional
Global attributes to save on this dataset.
(see FAQ, :ref:`approach to metadata`)
Examples
--------
Expand Down Expand Up @@ -10378,6 +10379,7 @@ def groupby(
*,
squeeze: Literal[False] = False,
restore_coord_dims: bool = False,
eagerly_compute_group: bool = True,
**groupers: Grouper,
) -> DatasetGroupBy:
"""Returns a DatasetGroupBy object for performing grouped operations.
Expand All @@ -10393,6 +10395,11 @@ def groupby(
restore_coord_dims : bool, default: False
If True, also restore the dimension order of multi-dimensional
coordinates.
eagerly_compute_group: bool
Whether to eagerly compute ``group`` when it is a chunked array.
This option is to maintain backwards compatibility. Set to False
to opt-in to future behaviour, where ``group`` is not automatically loaded
into memory.
**groupers : Mapping of str to Grouper or Resampler
Mapping of variable name to group by to :py:class:`Grouper` or :py:class:`Resampler` object.
One of ``group`` or ``groupers`` must be provided.
Expand Down Expand Up @@ -10475,7 +10482,9 @@ def groupby(
)

_validate_groupby_squeeze(squeeze)
rgroupers = _parse_group_and_groupers(self, group, groupers)
rgroupers = _parse_group_and_groupers(
self, group, groupers, eagerly_compute_group=eagerly_compute_group
)

return DatasetGroupBy(self, rgroupers, restore_coord_dims=restore_coord_dims)

Expand All @@ -10491,6 +10500,7 @@ def groupby_bins(
squeeze: Literal[False] = False,
restore_coord_dims: bool = False,
duplicates: Literal["raise", "drop"] = "raise",
eagerly_compute_group: bool = True,
) -> DatasetGroupBy:
"""Returns a DatasetGroupBy object for performing grouped operations.
Expand Down Expand Up @@ -10527,6 +10537,11 @@ def groupby_bins(
coordinates.
duplicates : {"raise", "drop"}, default: "raise"
If bin edges are not unique, raise ValueError or drop non-uniques.
eagerly_compute_group: bool
Whether to eagerly compute ``group`` when it is a chunked array.
This option is to maintain backwards compatibility. Set to False
to opt-in to future behaviour, where ``group`` is not automatically loaded
into memory.
Returns
-------
Expand Down Expand Up @@ -10564,7 +10579,9 @@ def groupby_bins(
precision=precision,
include_lowest=include_lowest,
)
rgrouper = ResolvedGrouper(grouper, group, self)
rgrouper = ResolvedGrouper(
grouper, group, self, eagerly_compute_group=eagerly_compute_group
)

return DatasetGroupBy(
self,
Expand Down
14 changes: 14 additions & 0 deletions xarray/core/datatree.py
Original file line number Diff line number Diff line change
Expand Up @@ -1573,6 +1573,7 @@ def to_netcdf(
format: T_DataTreeNetcdfTypes | None = None,
engine: T_DataTreeNetcdfEngine | None = None,
group: str | None = None,
write_inherited_coords: bool = False,
compute: bool = True,
**kwargs,
):
Expand Down Expand Up @@ -1609,6 +1610,11 @@ def to_netcdf(
group : str, optional
Path to the netCDF4 group in the given file to open as the root group
of the ``DataTree``. Currently, specifying a group is not supported.
write_inherited_coords : bool, default: False
If true, replicate inherited coordinates on all descendant nodes.
Otherwise, only write coordinates at the level at which they are
originally defined. This saves disk space, but requires opening the
full tree to load inherited coordinates.
compute : bool, default: True
If true compute immediately, otherwise return a
``dask.delayed.Delayed`` object that can be computed later.
Expand All @@ -1632,6 +1638,7 @@ def to_netcdf(
format=format,
engine=engine,
group=group,
write_inherited_coords=write_inherited_coords,
compute=compute,
**kwargs,
)
Expand All @@ -1643,6 +1650,7 @@ def to_zarr(
encoding=None,
consolidated: bool = True,
group: str | None = None,
write_inherited_coords: bool = False,
compute: Literal[True] = True,
**kwargs,
):
Expand All @@ -1668,6 +1676,11 @@ def to_zarr(
after writing metadata for all groups.
group : str, optional
Group path. (a.k.a. `path` in zarr terminology.)
write_inherited_coords : bool, default: False
If true, replicate inherited coordinates on all descendant nodes.
Otherwise, only write coordinates at the level at which they are
originally defined. This saves disk space, but requires opening the
full tree to load inherited coordinates.
compute : bool, default: True
If true compute immediately, otherwise return a
``dask.delayed.Delayed`` object that can be computed later. Metadata
Expand All @@ -1690,6 +1703,7 @@ def to_zarr(
encoding=encoding,
consolidated=consolidated,
group=group,
write_inherited_coords=write_inherited_coords,
compute=compute,
**kwargs,
)
Expand Down
Loading

0 comments on commit 408a023

Please sign in to comment.