Skip to content

Commit

Permalink
Merge branch 'main' into kerchunk_docs
Browse files Browse the repository at this point in the history
  • Loading branch information
dcherian authored Jul 11, 2024
2 parents ebf80f5 + a69815f commit 41d7f23
Show file tree
Hide file tree
Showing 35 changed files with 917 additions and 480 deletions.
5 changes: 5 additions & 0 deletions .github/release.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
changelog:
exclude:
authors:
- dependabot
- pre-commit-ci
5 changes: 2 additions & 3 deletions ci/install-upstream-wheels.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ $conda remove -y numba numbagg sparse
# temporarily remove numexpr
$conda remove -y numexpr
# temporarily remove backends
$conda remove -y cf_units hdf5 h5py netcdf4 pydap
$conda remove -y pydap
# forcibly remove packages to avoid artifacts
$conda remove -y --force \
numpy \
Expand All @@ -37,8 +37,7 @@ python -m pip install \
numpy \
scipy \
matplotlib \
pandas \
h5py
pandas
# for some reason pandas depends on pyarrow already.
# Remove once a `pyarrow` version compiled with `numpy>=2.0` is on `conda-forge`
python -m pip install \
Expand Down
2 changes: 1 addition & 1 deletion ci/requirements/all-but-dask.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ dependencies:
- netcdf4
- numba
- numbagg
- numpy<2
- numpy
- packaging
- pandas
- pint>=0.22
Expand Down
2 changes: 1 addition & 1 deletion ci/requirements/doc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ dependencies:
- nbsphinx
- netcdf4>=1.5
- numba
- numpy>=1.21,<2
- numpy>=2
- packaging>=21.3
- pandas>=1.4,!=2.1.0
- pooch
Expand Down
2 changes: 1 addition & 1 deletion ci/requirements/environment-windows.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ dependencies:
- netcdf4
- numba
- numbagg
- numpy<2
- numpy
- packaging
- pandas
# - pint>=0.22
Expand Down
2 changes: 1 addition & 1 deletion ci/requirements/environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ dependencies:
- numba
- numbagg
- numexpr
- numpy<2
- numpy
- opt_einsum
- packaging
- pandas
Expand Down
4 changes: 2 additions & 2 deletions doc/getting-started-guide/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -352,9 +352,9 @@ Some packages may have additional functionality beyond what is shown here. You c
How does xarray handle missing values?
--------------------------------------

**xarray can handle missing values using ``np.NaN``**
**xarray can handle missing values using ``np.nan``**

- ``np.NaN`` is used to represent missing values in labeled arrays and datasets. It is a commonly used standard for representing missing or undefined numerical data in scientific computing. ``np.NaN`` is a constant value in NumPy that represents "Not a Number" or missing values.
- ``np.nan`` is used to represent missing values in labeled arrays and datasets. It is a commonly used standard for representing missing or undefined numerical data in scientific computing. ``np.nan`` is a constant value in NumPy that represents "Not a Number" or missing values.

- Most of xarray's computation methods are designed to automatically handle missing values appropriately.

Expand Down
4 changes: 2 additions & 2 deletions doc/user-guide/computation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -426,7 +426,7 @@ However, the functions also take missing values in the data into account:

.. ipython:: python
data = xr.DataArray([np.NaN, 2, 4])
data = xr.DataArray([np.nan, 2, 4])
weights = xr.DataArray([8, 1, 1])
data.weighted(weights).mean()
Expand All @@ -444,7 +444,7 @@ If the weights add up to to 0, ``sum`` returns 0:
data.weighted(weights).sum()
and ``mean``, ``std`` and ``var`` return ``NaN``:
and ``mean``, ``std`` and ``var`` return ``nan``:

.. ipython:: python
Expand Down
4 changes: 2 additions & 2 deletions doc/user-guide/interpolation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -292,8 +292,8 @@ Let's see how :py:meth:`~xarray.DataArray.interp` works on real data.
axes[0].set_title("Raw data")
# Interpolated data
new_lon = np.linspace(ds.lon[0], ds.lon[-1], ds.sizes["lon"] * 4)
new_lat = np.linspace(ds.lat[0], ds.lat[-1], ds.sizes["lat"] * 4)
new_lon = np.linspace(ds.lon[0].item(), ds.lon[-1].item(), ds.sizes["lon"] * 4)
new_lat = np.linspace(ds.lat[0].item(), ds.lat[-1].item(), ds.sizes["lat"] * 4)
dsi = ds.interp(lat=new_lat, lon=new_lon)
dsi.air.plot(ax=axes[1])
@savefig interpolation_sample3.png width=8in
Expand Down
5 changes: 3 additions & 2 deletions doc/user-guide/testing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -239,9 +239,10 @@ If the array type you want to generate has an array API-compliant top-level name
you can use this neat trick:

.. ipython:: python
:okwarning:
from numpy import array_api as xp # available in numpy 1.26.0
import numpy as xp # compatible in numpy 2.0
# use `import numpy.array_api as xp` in numpy>=1.23,<2.0
from hypothesis.extra.array_api import make_strategies_namespace
Expand Down
17 changes: 16 additions & 1 deletion doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,24 +37,39 @@ Deprecations

Bug fixes
~~~~~~~~~
- Fix scatter plot broadcasting unneccesarily. (:issue:`9129`, :pull:`9206`)
By `Jimmy Westling <https://github.com/illviljan>`_.
- Don't convert custom indexes to ``pandas`` indexes when computing a diff (:pull:`9157`)
By `Justus Magin <https://github.com/keewis>`_.
- Make :py:func:`testing.assert_allclose` work with numpy 2.0 (:issue:`9165`, :pull:`9166`).
By `Pontus Lurcock <https://github.com/pont-us>`_.
- Allow diffing objects with array attributes on variables (:issue:`9153`, :pull:`9169`).
By `Justus Magin <https://github.com/keewis>`_.
- ``numpy>=2`` compatibility in the ``netcdf4`` backend (:pull:`9136`).
By `Justus Magin <https://github.com/keewis>`_ and `Kai Mühlbauer <https://github.com/kmuehlbauer>`_.
- Promote floating-point numeric datetimes before decoding (:issue:`9179`, :pull:`9182`).
By `Justus Magin <https://github.com/keewis>`_.
- Address regression introduced in :pull:`9002` that prevented objects returned
by py:meth:`DataArray.convert_calendar` to be indexed by a time index in
certain circumstances (:issue:`9138`, :pull:`9192`). By `Mark Harfouche
<https://github.com/hmaarrfk>`_ and `Spencer Clark
<https://github.com/spencerkclark>`.

- Fiy static typing of tolerance arguments by allowing `str` type (:issue:`8892`, :pull:`9194`).
By `Michael Niklas <https://github.com/headtr1ck>`_.
- Dark themes are now properly detected for ``html[data-theme=dark]``-tags (:pull:`9200`).
By `Dieter Werthmüller <https://github.com/prisae>`_.
- Reductions no longer fail for ``np.complex_`` dtype arrays when numbagg is
installed.
By `Maximilian Roos <https://github.com/max-sixty>`_

Documentation
~~~~~~~~~~~~~

- Adds a flow-chart diagram to help users navigate help resources (`Discussion #8990 <https://github.com/pydata/xarray/discussions/8990>`_).
By `Jessica Scheick <https://github.com/jessicas11>`_.
- Improvements to Zarr & chunking docs (:pull:`9139`, :pull:`9140`, :pull:`9132`)
By `Maximilian Roos <https://github.com/max-sixty>`_
By `Maximilian Roos <https://github.com/max-sixty>`_.


Internal Changes
Expand Down
39 changes: 23 additions & 16 deletions properties/test_pandas_roundtrip.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@
import pytest

import xarray as xr
from xarray.tests import has_pandas_3

pytest.importorskip("hypothesis")
import hypothesis.extra.numpy as npst # isort:skip
Expand All @@ -25,22 +24,34 @@

numeric_series = numeric_dtypes.flatmap(lambda dt: pdst.series(dtype=dt))


@st.composite
def dataframe_strategy(draw):
tz = draw(st.timezones())
dtype = pd.DatetimeTZDtype(unit="ns", tz=tz)

datetimes = st.datetimes(
min_value=pd.Timestamp("1677-09-21T00:12:43.145224193"),
max_value=pd.Timestamp("2262-04-11T23:47:16.854775807"),
timezones=st.just(tz),
)

df = pdst.data_frames(
[
pdst.column("datetime_col", elements=datetimes),
pdst.column("other_col", elements=st.integers()),
],
index=pdst.range_indexes(min_size=1, max_size=10),
)
return draw(df).astype({"datetime_col": dtype})


an_array = npst.arrays(
dtype=numeric_dtypes,
shape=npst.array_shapes(max_dims=2), # can only convert 1D/2D to pandas
)


datetime_with_tz_strategy = st.datetimes(timezones=st.timezones())
dataframe_strategy = pdst.data_frames(
[
pdst.column("datetime_col", elements=datetime_with_tz_strategy),
pdst.column("other_col", elements=st.integers()),
],
index=pdst.range_indexes(min_size=1, max_size=10),
)


@st.composite
def datasets_1d_vars(draw) -> xr.Dataset:
"""Generate datasets with only 1D variables
Expand Down Expand Up @@ -111,11 +122,7 @@ def test_roundtrip_pandas_dataframe(df) -> None:
xr.testing.assert_identical(arr, roundtripped.to_xarray())


@pytest.mark.skipif(
has_pandas_3,
reason="fails to roundtrip on pandas 3 (see https://github.com/pydata/xarray/issues/9098)",
)
@given(df=dataframe_strategy)
@given(df=dataframe_strategy())
def test_roundtrip_pandas_dataframe_datetime(df) -> None:
# Need to name the indexes, otherwise Xarray names them 'dim_0', 'dim_1'.
df.index.name = "rows"
Expand Down
12 changes: 11 additions & 1 deletion xarray/coding/calendar_ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,10 @@

from xarray.coding.cftime_offsets import date_range_like, get_date_type
from xarray.coding.cftimeindex import CFTimeIndex
from xarray.coding.times import _should_cftime_be_used, convert_times
from xarray.coding.times import (
_should_cftime_be_used,
convert_times,
)
from xarray.core.common import _contains_datetime_like_objects, is_np_datetime_like

try:
Expand Down Expand Up @@ -222,6 +225,13 @@ def convert_calendar(
# Remove NaN that where put on invalid dates in target calendar
out = out.where(out[dim].notnull(), drop=True)

if use_cftime:
# Reassign times to ensure time index of output is a CFTimeIndex
# (previously it was an Index due to the presence of NaN values).
# Note this is not needed in the case that the output time index is
# a DatetimeIndex, since DatetimeIndexes can handle NaN values.
out[dim] = CFTimeIndex(out[dim].data)

if missing is not None:
time_target = date_range_like(time, calendar=calendar, use_cftime=use_cftime)
out = out.reindex({dim: time_target}, fill_value=missing)
Expand Down
24 changes: 12 additions & 12 deletions xarray/coding/times.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
)
from xarray.core import indexing
from xarray.core.common import contains_cftime_datetimes, is_np_datetime_like
from xarray.core.duck_array_ops import asarray
from xarray.core.duck_array_ops import asarray, ravel, reshape
from xarray.core.formatting import first_n_items, format_timestamp, last_item
from xarray.core.pdcompat import nanosecond_precision_timestamp
from xarray.core.utils import emit_user_level_warning
Expand Down Expand Up @@ -315,7 +315,7 @@ def decode_cf_datetime(
cftime.num2date
"""
num_dates = np.asarray(num_dates)
flat_num_dates = num_dates.ravel()
flat_num_dates = ravel(num_dates)
if calendar is None:
calendar = "standard"

Expand Down Expand Up @@ -348,7 +348,7 @@ def decode_cf_datetime(
else:
dates = _decode_datetime_with_pandas(flat_num_dates, units, calendar)

return dates.reshape(num_dates.shape)
return reshape(dates, num_dates.shape)


def to_timedelta_unboxed(value, **kwargs):
Expand All @@ -369,8 +369,8 @@ def decode_cf_timedelta(num_timedeltas, units: str) -> np.ndarray:
"""
num_timedeltas = np.asarray(num_timedeltas)
units = _netcdf_to_numpy_timeunit(units)
result = to_timedelta_unboxed(num_timedeltas.ravel(), unit=units)
return result.reshape(num_timedeltas.shape)
result = to_timedelta_unboxed(ravel(num_timedeltas), unit=units)
return reshape(result, num_timedeltas.shape)


def _unit_timedelta_cftime(units: str) -> timedelta:
Expand Down Expand Up @@ -428,7 +428,7 @@ def infer_datetime_units(dates) -> str:
'hours', 'minutes' or 'seconds' (the first one that can evenly divide all
unique time deltas in `dates`)
"""
dates = np.asarray(dates).ravel()
dates = ravel(np.asarray(dates))
if np.asarray(dates).dtype == "datetime64[ns]":
dates = to_datetime_unboxed(dates)
dates = dates[pd.notnull(dates)]
Expand Down Expand Up @@ -456,7 +456,7 @@ def infer_timedelta_units(deltas) -> str:
{'days', 'hours', 'minutes' 'seconds'} (the first one that can evenly
divide all unique time deltas in `deltas`)
"""
deltas = to_timedelta_unboxed(np.asarray(deltas).ravel())
deltas = to_timedelta_unboxed(ravel(np.asarray(deltas)))
unique_timedeltas = np.unique(deltas[pd.notnull(deltas)])
return _infer_time_units_from_diff(unique_timedeltas)

Expand Down Expand Up @@ -643,7 +643,7 @@ def encode_datetime(d):
except TypeError:
return np.nan if d is None else cftime.date2num(d, units, calendar)

return np.array([encode_datetime(d) for d in dates.ravel()]).reshape(dates.shape)
return reshape(np.array([encode_datetime(d) for d in ravel(dates)]), dates.shape)


def cast_to_int_if_safe(num) -> np.ndarray:
Expand Down Expand Up @@ -753,7 +753,7 @@ def _eagerly_encode_cf_datetime(
# Wrap the dates in a DatetimeIndex to do the subtraction to ensure
# an OverflowError is raised if the ref_date is too far away from
# dates to be encoded (GH 2272).
dates_as_index = pd.DatetimeIndex(dates.ravel())
dates_as_index = pd.DatetimeIndex(ravel(dates))
time_deltas = dates_as_index - ref_date

# retrieve needed units to faithfully encode to int64
Expand Down Expand Up @@ -791,7 +791,7 @@ def _eagerly_encode_cf_datetime(
floor_division = True

num = _division(time_deltas, time_delta, floor_division)
num = num.values.reshape(dates.shape)
num = reshape(num.values, dates.shape)

except (OutOfBoundsDatetime, OverflowError, ValueError):
num = _encode_datetime_with_cftime(dates, units, calendar)
Expand Down Expand Up @@ -879,7 +879,7 @@ def _eagerly_encode_cf_timedelta(
units = data_units

time_delta = _time_units_to_timedelta64(units)
time_deltas = pd.TimedeltaIndex(timedeltas.ravel())
time_deltas = pd.TimedeltaIndex(ravel(timedeltas))

# retrieve needed units to faithfully encode to int64
needed_units = data_units
Expand Down Expand Up @@ -911,7 +911,7 @@ def _eagerly_encode_cf_timedelta(
floor_division = True

num = _division(time_deltas, time_delta, floor_division)
num = num.values.reshape(timedeltas.shape)
num = reshape(num.values, timedeltas.shape)

if dtype is not None:
num = _cast_to_dtype_if_safe(num, dtype)
Expand Down
16 changes: 10 additions & 6 deletions xarray/coding/variables.py
Original file line number Diff line number Diff line change
Expand Up @@ -516,10 +516,13 @@ def encode(self, variable: Variable, name: T_Name = None) -> Variable:
dims, data, attrs, encoding = unpack_for_encoding(variable)

pop_to(encoding, attrs, "_Unsigned")
signed_dtype = np.dtype(f"i{data.dtype.itemsize}")
# we need the on-disk type here
# trying to get it from encoding, resort to an int with the same precision as data.dtype if not available
signed_dtype = np.dtype(encoding.get("dtype", f"i{data.dtype.itemsize}"))
if "_FillValue" in attrs:
new_fill = signed_dtype.type(attrs["_FillValue"])
attrs["_FillValue"] = new_fill
new_fill = np.array(attrs["_FillValue"])
# use view here to prevent OverflowError
attrs["_FillValue"] = new_fill.view(signed_dtype).item()
data = duck_array_ops.astype(duck_array_ops.around(data), signed_dtype)

return Variable(dims, data, attrs, encoding, fastpath=True)
Expand All @@ -535,10 +538,11 @@ def decode(self, variable: Variable, name: T_Name = None) -> Variable:
if unsigned == "true":
unsigned_dtype = np.dtype(f"u{data.dtype.itemsize}")
transform = partial(np.asarray, dtype=unsigned_dtype)
data = lazy_elemwise_func(data, transform, unsigned_dtype)
if "_FillValue" in attrs:
new_fill = unsigned_dtype.type(attrs["_FillValue"])
attrs["_FillValue"] = new_fill
new_fill = np.array(attrs["_FillValue"], dtype=data.dtype)
# use view here to prevent OverflowError
attrs["_FillValue"] = new_fill.view(unsigned_dtype).item()
data = lazy_elemwise_func(data, transform, unsigned_dtype)
elif data.dtype.kind == "u":
if unsigned == "false":
signed_dtype = np.dtype(f"i{data.dtype.itemsize}")
Expand Down
Loading

0 comments on commit 41d7f23

Please sign in to comment.