Skip to content

Commit

Permalink
Updates
Browse files Browse the repository at this point in the history
  • Loading branch information
dcherian committed Jun 12, 2024
1 parent 06dc276 commit ac9bce1
Show file tree
Hide file tree
Showing 4 changed files with 17 additions and 4 deletions.
5 changes: 5 additions & 0 deletions doc/user-guide/dask.rst
Original file line number Diff line number Diff line change
Expand Up @@ -296,6 +296,11 @@ loaded into Dask or not:
Automatic parallelization with ``apply_ufunc`` and ``map_blocks``
-----------------------------------------------------------------

.. tip::

Some problems can become embarassingly parallel and thus easy to parallelize automatically
by rechunk to a frequency: e.g. ``ds.chunk(time="YE")``. See :py:meth:`Dataset.chunk` for more.

Almost all of xarray's built-in operations work on Dask arrays. If you want to
use a function that isn't wrapped by xarray, and have it applied in parallel on
each block of your xarray object, you have three options:
Expand Down
6 changes: 4 additions & 2 deletions xarray/core/dataarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -1356,11 +1356,13 @@ def chunk(
sizes along that dimension will not be updated; non-dask arrays will be
converted into dask arrays with a single block.
Along datetime-like dimensions, a pandas frequency string is also accepted.
Parameters
----------
chunks : int, "auto", tuple of int or mapping of Hashable to int, optional
chunks : int, "auto", tuple of int or mapping of hashable to int or a pandas frequency string, optional
Chunk sizes along each dimension, e.g., ``5``, ``"auto"``, ``(5, 5)`` or
``{"x": 5, "y": 5}``.
``{"x": 5, "y": 5}`` or ``{"x": 5, "time": "YE"}``.
name_prefix : str, optional
Prefix for the name of the new dask array.
token : str, optional
Expand Down
7 changes: 5 additions & 2 deletions xarray/core/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,7 @@
QueryParserOptions,
ReindexMethodOptions,
SideOptions,
T_NormalizedChunks,
T_Xarray,
)
from xarray.core.weighted import DatasetWeighted
Expand Down Expand Up @@ -2755,9 +2756,11 @@ def _resolve_frequency(name: Hashable, freq: str) -> tuple[int]:
)
return chunks

chunks_mapping_ints = {
chunks_mapping_ints: T_NormalizedChunks = {
name: (
_resolve_frequency(name, chunks) if isinstance(chunks, str) else chunks
_resolve_frequency(name, chunks)
if isinstance(chunks, str) and chunks != "auto"
else chunks
)
for name, chunks in chunks_mapping.items()
}
Expand Down
3 changes: 3 additions & 0 deletions xarray/tests/test_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -1229,6 +1229,9 @@ def test_chunk_by_frequecy_errors(self):
ds["x"] = ("x", [1, 2, 3])
with pytest.raises(ValueError, match="datetime variables"):
ds.chunk(x="YE")
ds["x"] = ("x", xr.date_range("2001-01-01", periods=3, freq="D"))
with pytest.raises(ValueError, match="Invalid frequency"):
ds.chunk(x="foo")

@requires_dask
def test_dask_is_lazy(self) -> None:
Expand Down

0 comments on commit ac9bce1

Please sign in to comment.