Skip to content

Commit

Permalink
Merge remote-tracking branch 'upstream/main' into chunk-by-frequency
Browse files Browse the repository at this point in the history
* upstream/main:
  [skip-ci] Try fixing hypothesis CI trigger (#9112)
  Undo custom padding-top. (#9107)
  add remaining core-dev citations [skip-ci][skip-rtd] (#9110)
  Add user survey announcement to docs (#9101)
  skip the `pandas` datetime roundtrip test with `pandas=3.0` (#9104)
  Adds Matt Savoie to CITATION.cff (#9103)
  [skip-ci] Fix skip-ci for hypothesis (#9102)
  open_datatree performance improvement on NetCDF, H5, and Zarr files (#9014)
  Migrate datatree io.py and common.py into xarray/core (#9011)
  Micro optimizations to improve indexing (#9002)
  (fix): don't handle time-dtypes as extension arrays in `from_dataframe` (#9042)
  • Loading branch information
dcherian committed Jun 13, 2024
2 parents 8a980ef + 6554855 commit 566fd37
Show file tree
Hide file tree
Showing 20 changed files with 556 additions and 308 deletions.
6 changes: 3 additions & 3 deletions .github/workflows/hypothesis.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -39,9 +39,9 @@ jobs:
if: |
always()
&& (
(github.event_name == 'schedule' || github.event_name == 'workflow_dispatch')
|| needs.detect-ci-trigger.outputs.triggered == 'true'
|| contains( github.event.pull_request.labels.*.name, 'run-slow-hypothesis')
needs.detect-ci-trigger.outputs.triggered == 'false'
&& ( (github.event_name == 'schedule' || github.event_name == 'workflow_dispatch')
|| contains( github.event.pull_request.labels.*.name, 'run-slow-hypothesis'))
)
defaults:
run:
Expand Down
5 changes: 5 additions & 0 deletions CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,11 @@ authors:
- family-names: "Scheick"
given-names: "Jessica"
orcid: "https://orcid.org/0000-0002-3421-4459"
- family-names: "Savoie"
given-names: "Matthew"
orcid: "https://orcid.org/0000-0002-8881-2550"
- family-names: "Littlejohns"
given-names: "Owen"
title: "xarray"
abstract: "N-D labeled arrays and datasets in Python."
license: Apache-2.0
Expand Down
7 changes: 2 additions & 5 deletions doc/_static/style.css
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,8 @@ table.docutils td {
word-wrap: break-word;
}

div.bd-header-announcement {
background-color: unset;
color: #000;
.bd-header-announcement {
background-color: var(--pst-color-info-bg);
}

/* Reduce left and right margins */
Expand Down Expand Up @@ -222,8 +221,6 @@ main *:target::before {
}

body {
/* Add padding to body to avoid overlap with navbar. */
padding-top: var(--navbar-height);
width: 100%;
}

Expand Down
2 changes: 1 addition & 1 deletion doc/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -242,7 +242,7 @@
Theme by the <a href="https://ebp.jupyterbook.org">Executable Book Project</a></p>""",
twitter_url="https://twitter.com/xarray_dev",
icon_links=[], # workaround for pydata/pydata-sphinx-theme#1220
announcement="🍾 <a href='https://github.com/pydata/xarray/discussions/8462'>Xarray is now 10 years old!</a> 🎉",
announcement="<a href='https://forms.gle/KEq7WviCdz9xTaJX6'>Xarray's 2024 User Survey is live now. Please take ~5 minutes to fill it out and help us improve Xarray.</a>",
)


Expand Down
22 changes: 16 additions & 6 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ What's New
.. _whats-new.2024.05.1:

v2024.05.1 (unreleased)
v2024.06 (unreleased)
-----------------------

New Features
Expand All @@ -28,6 +28,10 @@ Performance

- Small optimization to the netCDF4 and h5netcdf backends (:issue:`9058`, :pull:`9067`).
By `Deepak Cherian <https://github.com/dcherian>`_.
- Small optimizations to help reduce indexing speed of datasets (:pull:`9002`).
By `Mark Harfouche <https://github.com/hmaarrfk>`_.
- Performance improvement in `open_datatree` method for Zarr, netCDF4 and h5netcdf backends (:issue:`8994`, :pull:`9014`).
By `Alfonso Ladino <https://github.com/aladinor>`_.


Breaking changes
Expand All @@ -40,6 +44,9 @@ Deprecations

Bug fixes
~~~~~~~~~
- Preserve conversion of timezone-aware pandas Datetime arrays to numpy object arrays
(:issue:`9026`, :pull:`9042`).
By `Ilan Gold <https://github.com/ilan-gold>`_.

- :py:meth:`DataArrayResample.interpolate` and :py:meth:`DatasetResample.interpolate` method now
support aribtrary kwargs such as ``order`` for polynomial interpolation. (:issue:`8762`).
Expand All @@ -54,6 +61,10 @@ Documentation

Internal Changes
~~~~~~~~~~~~~~~~
- Migrates remainder of ``io.py`` to ``xarray/core/datatree_io.py`` and
``TreeAttrAccessMixin`` into ``xarray/core/common.py`` (:pull: `9011`)
By `Owen Littlejohns <https://github.com/owenlittlejohns>`_ and
`Tom Nicholas <https://github.com/TomNicholas>`_.


.. _whats-new.2024.05.0:
Expand Down Expand Up @@ -136,10 +147,9 @@ Internal Changes
By `Owen Littlejohns <https://github.com/owenlittlejohns>`_, `Matt Savoie
<https://github.com/flamingbear>`_ and `Tom Nicholas <https://github.com/TomNicholas>`_.
- ``transpose``, ``set_dims``, ``stack`` & ``unstack`` now use a ``dim`` kwarg
rather than ``dims`` or ``dimensions``. This is the final change to unify
xarray functions to use ``dim``. Using the existing kwarg will raise a
warning.
By `Maximilian Roos <https://github.com/max-sixty>`_
rather than ``dims`` or ``dimensions``. This is the final change to make xarray methods
consistent with their use of ``dim``. Using the existing kwarg will raise a
warning. By `Maximilian Roos <https://github.com/max-sixty>`_

.. _whats-new.2024.03.0:

Expand Down Expand Up @@ -2903,7 +2913,7 @@ Bug fixes
process (:issue:`4045`, :pull:`4684`). It also enables encoding and decoding standard
calendar dates with time units of nanoseconds (:pull:`4400`).
By `Spencer Clark <https://github.com/spencerkclark>`_ and `Mark Harfouche
<http://github.com/hmaarrfk>`_.
<https://github.com/hmaarrfk>`_.
- :py:meth:`DataArray.astype`, :py:meth:`Dataset.astype` and :py:meth:`Variable.astype` support
the ``order`` and ``subok`` parameters again. This fixes a regression introduced in version 0.16.1
(:issue:`4644`, :pull:`4683`).
Expand Down
27 changes: 27 additions & 0 deletions properties/test_pandas_roundtrip.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
import pytest

import xarray as xr
from xarray.tests import has_pandas_3

pytest.importorskip("hypothesis")
import hypothesis.extra.numpy as npst # isort:skip
Expand All @@ -30,6 +31,16 @@
)


datetime_with_tz_strategy = st.datetimes(timezones=st.timezones())
dataframe_strategy = pdst.data_frames(
[
pdst.column("datetime_col", elements=datetime_with_tz_strategy),
pdst.column("other_col", elements=st.integers()),
],
index=pdst.range_indexes(min_size=1, max_size=10),
)


@st.composite
def datasets_1d_vars(draw) -> xr.Dataset:
"""Generate datasets with only 1D variables
Expand Down Expand Up @@ -98,3 +109,19 @@ def test_roundtrip_pandas_dataframe(df) -> None:
roundtripped = arr.to_pandas()
pd.testing.assert_frame_equal(df, roundtripped)
xr.testing.assert_identical(arr, roundtripped.to_xarray())


@pytest.mark.skipif(
has_pandas_3,
reason="fails to roundtrip on pandas 3 (see https://github.com/pydata/xarray/issues/9098)",
)
@given(df=dataframe_strategy)
def test_roundtrip_pandas_dataframe_datetime(df) -> None:
# Need to name the indexes, otherwise Xarray names them 'dim_0', 'dim_1'.
df.index.name = "rows"
df.columns.name = "cols"
dataset = xr.Dataset.from_dataframe(df)
roundtripped = dataset.to_dataframe()
roundtripped.columns.name = "cols" # why?
pd.testing.assert_frame_equal(df, roundtripped)
xr.testing.assert_identical(dataset, roundtripped.to_xarray())
18 changes: 9 additions & 9 deletions xarray/backends/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@
from xarray.core.dataarray import DataArray
from xarray.core.dataset import Dataset, _get_chunk, _maybe_chunk
from xarray.core.indexes import Index
from xarray.core.types import ZarrWriteModes
from xarray.core.types import NetcdfWriteModes, ZarrWriteModes
from xarray.core.utils import is_remote_uri
from xarray.namedarray.daskmanager import DaskManager
from xarray.namedarray.parallelcompat import guess_chunkmanager
Expand Down Expand Up @@ -1120,7 +1120,7 @@ def open_mfdataset(
def to_netcdf(
dataset: Dataset,
path_or_file: str | os.PathLike | None = None,
mode: Literal["w", "a"] = "w",
mode: NetcdfWriteModes = "w",
format: T_NetcdfTypes | None = None,
group: str | None = None,
engine: T_NetcdfEngine | None = None,
Expand All @@ -1138,7 +1138,7 @@ def to_netcdf(
def to_netcdf(
dataset: Dataset,
path_or_file: None = None,
mode: Literal["w", "a"] = "w",
mode: NetcdfWriteModes = "w",
format: T_NetcdfTypes | None = None,
group: str | None = None,
engine: T_NetcdfEngine | None = None,
Expand All @@ -1155,7 +1155,7 @@ def to_netcdf(
def to_netcdf(
dataset: Dataset,
path_or_file: str | os.PathLike,
mode: Literal["w", "a"] = "w",
mode: NetcdfWriteModes = "w",
format: T_NetcdfTypes | None = None,
group: str | None = None,
engine: T_NetcdfEngine | None = None,
Expand All @@ -1173,7 +1173,7 @@ def to_netcdf(
def to_netcdf(
dataset: Dataset,
path_or_file: str | os.PathLike,
mode: Literal["w", "a"] = "w",
mode: NetcdfWriteModes = "w",
format: T_NetcdfTypes | None = None,
group: str | None = None,
engine: T_NetcdfEngine | None = None,
Expand All @@ -1191,7 +1191,7 @@ def to_netcdf(
def to_netcdf(
dataset: Dataset,
path_or_file: str | os.PathLike,
mode: Literal["w", "a"] = "w",
mode: NetcdfWriteModes = "w",
format: T_NetcdfTypes | None = None,
group: str | None = None,
engine: T_NetcdfEngine | None = None,
Expand All @@ -1209,7 +1209,7 @@ def to_netcdf(
def to_netcdf(
dataset: Dataset,
path_or_file: str | os.PathLike,
mode: Literal["w", "a"] = "w",
mode: NetcdfWriteModes = "w",
format: T_NetcdfTypes | None = None,
group: str | None = None,
engine: T_NetcdfEngine | None = None,
Expand All @@ -1226,7 +1226,7 @@ def to_netcdf(
def to_netcdf(
dataset: Dataset,
path_or_file: str | os.PathLike | None,
mode: Literal["w", "a"] = "w",
mode: NetcdfWriteModes = "w",
format: T_NetcdfTypes | None = None,
group: str | None = None,
engine: T_NetcdfEngine | None = None,
Expand All @@ -1241,7 +1241,7 @@ def to_netcdf(
def to_netcdf(
dataset: Dataset,
path_or_file: str | os.PathLike | None = None,
mode: Literal["w", "a"] = "w",
mode: NetcdfWriteModes = "w",
format: T_NetcdfTypes | None = None,
group: str | None = None,
engine: T_NetcdfEngine | None = None,
Expand Down
30 changes: 0 additions & 30 deletions xarray/backends/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,6 @@
if TYPE_CHECKING:
from io import BufferedIOBase

from h5netcdf.legacyapi import Dataset as ncDatasetLegacyH5
from netCDF4 import Dataset as ncDataset

from xarray.core.dataset import Dataset
from xarray.core.datatree import DataTree
from xarray.core.types import NestedSequence
Expand Down Expand Up @@ -131,33 +128,6 @@ def _decode_variable_name(name):
return name


def _open_datatree_netcdf(
ncDataset: ncDataset | ncDatasetLegacyH5,
filename_or_obj: str | os.PathLike[Any] | BufferedIOBase | AbstractDataStore,
**kwargs,
) -> DataTree:
from xarray.backends.api import open_dataset
from xarray.core.datatree import DataTree
from xarray.core.treenode import NodePath

ds = open_dataset(filename_or_obj, **kwargs)
tree_root = DataTree.from_dict({"/": ds})
with ncDataset(filename_or_obj, mode="r") as ncds:
for path in _iter_nc_groups(ncds):
subgroup_ds = open_dataset(filename_or_obj, group=path, **kwargs)

# TODO refactor to use __setitem__ once creation of new nodes by assigning Dataset works again
node_name = NodePath(path).name
new_node: DataTree = DataTree(name=node_name, data=subgroup_ds)
tree_root._set_item(
path,
new_node,
allow_overwrite=False,
new_nodes_along_path=True,
)
return tree_root


def _iter_nc_groups(root, parent="/"):
from xarray.core.treenode import NodePath

Expand Down
54 changes: 50 additions & 4 deletions xarray/backends/h5netcdf_.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,14 @@
import functools
import io
import os
from collections.abc import Iterable
from collections.abc import Callable, Iterable
from typing import TYPE_CHECKING, Any

from xarray.backends.common import (
BACKEND_ENTRYPOINTS,
BackendEntrypoint,
WritableCFDataStore,
_normalize_path,
_open_datatree_netcdf,
find_root_and_group,
)
from xarray.backends.file_manager import CachingFileManager, DummyFileManager
Expand Down Expand Up @@ -431,11 +430,58 @@ def open_dataset( # type: ignore[override] # allow LSP violation, not supporti
def open_datatree(
self,
filename_or_obj: str | os.PathLike[Any] | BufferedIOBase | AbstractDataStore,
*,
mask_and_scale=True,
decode_times=True,
concat_characters=True,
decode_coords=True,
drop_variables: str | Iterable[str] | None = None,
use_cftime=None,
decode_timedelta=None,
group: str | Iterable[str] | Callable | None = None,
**kwargs,
) -> DataTree:
from h5netcdf.legacyapi import Dataset as ncDataset
from xarray.backends.api import open_dataset
from xarray.backends.common import _iter_nc_groups
from xarray.core.datatree import DataTree
from xarray.core.treenode import NodePath
from xarray.core.utils import close_on_error

return _open_datatree_netcdf(ncDataset, filename_or_obj, **kwargs)
filename_or_obj = _normalize_path(filename_or_obj)
store = H5NetCDFStore.open(
filename_or_obj,
group=group,
)
if group:
parent = NodePath("/") / NodePath(group)
else:
parent = NodePath("/")

manager = store._manager
ds = open_dataset(store, **kwargs)
tree_root = DataTree.from_dict({str(parent): ds})
for path_group in _iter_nc_groups(store.ds, parent=parent):
group_store = H5NetCDFStore(manager, group=path_group, **kwargs)
store_entrypoint = StoreBackendEntrypoint()
with close_on_error(group_store):
ds = store_entrypoint.open_dataset(
group_store,
mask_and_scale=mask_and_scale,
decode_times=decode_times,
concat_characters=concat_characters,
decode_coords=decode_coords,
drop_variables=drop_variables,
use_cftime=use_cftime,
decode_timedelta=decode_timedelta,
)
new_node: DataTree = DataTree(name=NodePath(path_group).name, data=ds)
tree_root._set_item(
path_group,
new_node,
allow_overwrite=False,
new_nodes_along_path=True,
)
return tree_root


BACKEND_ENTRYPOINTS["h5netcdf"] = ("h5netcdf", H5netcdfBackendEntrypoint)
Loading

0 comments on commit 566fd37

Please sign in to comment.