Skip to content
forked from pydata/xarray

Commit

Permalink
Merge remote-tracking branch 'upstream/master' into indexes/dataarray
Browse files Browse the repository at this point in the history
* upstream/master:
  Added fill_value for unstack (pydata#3541)
  Add DatasetGroupBy.quantile (pydata#3527)
  ensure rename does not change index type (pydata#3532)
  Leave empty slot when not using accessors
  interpolate_na: Add max_gap support. (pydata#3302)
  units & deprecation merge (pydata#3530)
  Fix set_index when an existing dimension becomes a level (pydata#3520)
  add Variable._replace (pydata#3528)
  Tests for module-level functions with units (pydata#3493)
  Harmonize `FillValue` and `missing_value` during encoding and decoding steps (pydata#3502)
  FUNDING.yml (pydata#3523)
  Allow appending datetime & boolean variables to zarr stores (pydata#3504)
  warn if dim is passed to rolling operations. (pydata#3513)
  Deprecate allow_lazy (pydata#3435)
  Recursive tokenization (pydata#3515)
  • Loading branch information
dcherian committed Nov 17, 2019
2 parents aefa5e3 + 56c16e4 commit 747962d
Show file tree
Hide file tree
Showing 23 changed files with 1,687 additions and 192 deletions.
2 changes: 2 additions & 0 deletions .github/FUNDING.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
github: numfocus
custom: http://numfocus.org/donate-to-xarray
3 changes: 3 additions & 0 deletions doc/computation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,9 @@ for filling missing values via 1D interpolation.
Note that xarray slightly diverges from the pandas ``interpolate`` syntax by
providing the ``use_coordinate`` keyword which facilitates a clear specification
of which values to use as the index in the interpolation.
xarray also provides the ``max_gap`` keyword argument to limit the interpolation to
data gaps of length ``max_gap`` or smaller. See :py:meth:`~xarray.DataArray.interpolate_na`
for more.

Aggregation
===========
Expand Down
11 changes: 6 additions & 5 deletions doc/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -340,9 +340,10 @@
# Example configuration for intersphinx: refer to the Python standard library.
intersphinx_mapping = {
"python": ("https://docs.python.org/3/", None),
"pandas": ("https://pandas.pydata.org/pandas-docs/stable/", None),
"iris": ("http://scitools.org.uk/iris/docs/latest/", None),
"numpy": ("https://docs.scipy.org/doc/numpy/", None),
"numba": ("https://numba.pydata.org/numba-doc/latest/", None),
"matplotlib": ("https://matplotlib.org/", None),
"pandas": ("https://pandas.pydata.org/pandas-docs/stable", None),
"iris": ("https://scitools.org.uk/iris/docs/latest", None),
"numpy": ("https://docs.scipy.org/doc/numpy", None),
"scipy": ("https://docs.scipy.org/doc/scipy/reference", None),
"numba": ("https://numba.pydata.org/numba-doc/latest", None),
"matplotlib": ("https://matplotlib.org", None),
}
33 changes: 31 additions & 2 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,13 @@ Breaking changes

New Features
~~~~~~~~~~~~

- Added the ``fill_value`` option to :py:meth:`~xarray.DataArray.unstack` and
:py:meth:`~xarray.Dataset.unstack` (:issue:`3518`).
By `Keisuke Fujii <https://github.com/fujiisoup>`_.
- Added the ``max_gap`` kwarg to :py:meth:`~xarray.DataArray.interpolate_na` and
:py:meth:`~xarray.Dataset.interpolate_na`. This controls the maximum size of the data
gap that will be filled by interpolation. By `Deepak Cherian <https://github.com/dcherian>`_.
- :py:meth:`Dataset.drop_sel` & :py:meth:`DataArray.drop_sel` have been added for dropping labels.
:py:meth:`Dataset.drop_vars` & :py:meth:`DataArray.drop_vars` have been added for
dropping variables (including coordinates). The existing ``drop`` methods remain as a backward compatible
Expand Down Expand Up @@ -73,12 +80,22 @@ New Features
for xarray objects. Note that xarray objects with a dask.array backend already used
deterministic hashing in previous releases; this change implements it when whole
xarray objects are embedded in a dask graph, e.g. when :meth:`DataArray.map` is
invoked. (:issue:`3378`, :pull:`3446`)
invoked. (:issue:`3378`, :pull:`3446`, :pull:`3515`)
By `Deepak Cherian <https://github.com/dcherian>`_ and
`Guido Imperiale <https://github.com/crusaderky>`_.
- Add the documented-but-missing :py:meth:`xarray.core.groupby.DatasetGroupBy.quantile`.
(:issue:`3525`, :pull:`3527`). By `Justus Magin <https://github.com/keewis>`_.

Bug fixes
~~~~~~~~~
- Ensure an index of type ``CFTimeIndex`` is not converted to a ``DatetimeIndex`` when
calling :py:meth:`Dataset.rename` (also :py:meth:`Dataset.rename_dims`
and :py:meth:`xr.Dataset.rename_vars`). By `Mathias Hauser <https://github.com/mathause>`_
(:issue:`3522`).
- Fix a bug in `set_index` in case that an existing dimension becomes a level variable of MultiIndex. (:pull:`3520`)
By `Keisuke Fujii <https://github.com/fujiisoup>`_.
- Harmonize `_FillValue`, `missing_value` during encoding and decoding steps. (:pull:`3502`)
By `Anderson Banihirwe <https://github.com/andersy005>`_.
- Fix regression introduced in v0.14.0 that would cause a crash if dask is installed
but cloudpickle isn't (:issue:`3401`) by `Rhys Doyle <https://github.com/rdoyle45>`_
- Fix grouping over variables with NaNs. (:issue:`2383`, :pull:`3406`).
Expand All @@ -88,9 +105,14 @@ Bug fixes
By `Deepak Cherian <https://github.com/dcherian>`_.
- Sync with cftime by removing `dayofwk=-1` for cftime>=1.0.4.
By `Anderson Banihirwe <https://github.com/andersy005>`_.
- Rolling reduction operations no longer compute dask arrays by default. (:issue:`3161`).
In addition, the ``allow_lazy`` kwarg to ``reduce`` is deprecated.
By `Deepak Cherian <https://github.com/dcherian>`_.
- Fix :py:meth:`xarray.core.groupby.DataArrayGroupBy.reduce` and
:py:meth:`xarray.core.groupby.DatasetGroupBy.reduce` when reducing over multiple dimensions.
(:issue:`3402`). By `Deepak Cherian <https://github.com/dcherian/>`_
- Allow appending datetime and bool data variables to zarr stores.
(:issue:`3480`). By `Akihiro Matsukawa <https://github.com/amatsukawa/>`_.

Documentation
~~~~~~~~~~~~~
Expand All @@ -111,7 +133,8 @@ Internal Changes
~~~~~~~~~~~~~~~~

- Added integration tests against `pint <https://pint.readthedocs.io/>`_.
(:pull:`3238`, :pull:`3447`, :pull:`3508`) by `Justus Magin <https://github.com/keewis>`_.
(:pull:`3238`, :pull:`3447`, :pull:`3493`, :pull:`3508`)
by `Justus Magin <https://github.com/keewis>`_.

.. note::

Expand All @@ -130,6 +153,9 @@ Internal Changes
- Enable type checking on default sentinel values (:pull:`3472`)
By `Maximilian Roos <https://github.com/max-sixty>`_

- Add :py:meth:`Variable._replace` for simpler replacing of a subset of attributes (:pull:`3472`)
By `Maximilian Roos <https://github.com/max-sixty>`_

.. _whats-new.0.14.0:

v0.14.0 (14 Oct 2019)
Expand Down Expand Up @@ -217,6 +243,9 @@ Bug fixes
By `Deepak Cherian <https://github.com/dcherian>`_.
- Fix error in concatenating unlabeled dimensions (:pull:`3362`).
By `Deepak Cherian <https://github.com/dcherian/>`_.
- Warn if the ``dim`` kwarg is passed to rolling operations. This is redundant since a dimension is
specified when the :py:class:`DatasetRolling` or :py:class:`DataArrayRolling` object is created.
(:pull:`3362`). By `Deepak Cherian <https://github.com/dcherian/>`_.

Documentation
~~~~~~~~~~~~~
Expand Down
7 changes: 5 additions & 2 deletions xarray/backends/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -1234,15 +1234,18 @@ def _validate_datatypes_for_zarr_append(dataset):
def check_dtype(var):
if (
not np.issubdtype(var.dtype, np.number)
and not np.issubdtype(var.dtype, np.datetime64)
and not np.issubdtype(var.dtype, np.bool)
and not coding.strings.is_unicode_dtype(var.dtype)
and not var.dtype == object
):
# and not re.match('^bytes[1-9]+$', var.dtype.name)):
raise ValueError(
"Invalid dtype for data variable: {} "
"dtype must be a subtype of number, "
"a fixed sized string, a fixed size "
"unicode string or an object".format(var)
"datetime, bool, a fixed sized string, "
"a fixed size unicode string or an "
"object".format(var)
)

for k in dataset.data_vars.values():
Expand Down
14 changes: 10 additions & 4 deletions xarray/coding/variables.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@

from ..core import dtypes, duck_array_ops, indexing
from ..core.pycompat import dask_array_type
from ..core.utils import equivalent
from ..core.variable import Variable


Expand Down Expand Up @@ -152,18 +151,25 @@ def encode(self, variable, name=None):
fv = encoding.get("_FillValue")
mv = encoding.get("missing_value")

if fv is not None and mv is not None and not equivalent(fv, mv):
if (
fv is not None
and mv is not None
and not duck_array_ops.allclose_or_equiv(fv, mv)
):
raise ValueError(
"Variable {!r} has multiple fill values {}. "
"Cannot encode data. ".format(name, [fv, mv])
f"Variable {name!r} has conflicting _FillValue ({fv}) and missing_value ({mv}). Cannot encode data."
)

if fv is not None:
# Ensure _FillValue is cast to same dtype as data's
encoding["_FillValue"] = data.dtype.type(fv)
fill_value = pop_to(encoding, attrs, "_FillValue", name=name)
if not pd.isnull(fill_value):
data = duck_array_ops.fillna(data, fill_value)

if mv is not None:
# Ensure missing_value is cast to same dtype as data's
encoding["missing_value"] = data.dtype.type(mv)
fill_value = pop_to(encoding, attrs, "missing_value", name=name)
if not pd.isnull(fill_value) and fv is None:
data = duck_array_ops.fillna(data, fill_value)
Expand Down
17 changes: 4 additions & 13 deletions xarray/core/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,14 +43,12 @@ def _reduce_method(cls, func: Callable, include_skipna: bool, numeric_only: bool
if include_skipna:

def wrapped_func(self, dim=None, axis=None, skipna=None, **kwargs):
return self.reduce(
func, dim, axis, skipna=skipna, allow_lazy=True, **kwargs
)
return self.reduce(func, dim, axis, skipna=skipna, **kwargs)

else:

def wrapped_func(self, dim=None, axis=None, **kwargs): # type: ignore
return self.reduce(func, dim, axis, allow_lazy=True, **kwargs)
return self.reduce(func, dim, axis, **kwargs)

return wrapped_func

Expand Down Expand Up @@ -83,20 +81,13 @@ def _reduce_method(cls, func: Callable, include_skipna: bool, numeric_only: bool

def wrapped_func(self, dim=None, skipna=None, **kwargs):
return self.reduce(
func,
dim,
skipna=skipna,
numeric_only=numeric_only,
allow_lazy=True,
**kwargs,
func, dim, skipna=skipna, numeric_only=numeric_only, **kwargs
)

else:

def wrapped_func(self, dim=None, **kwargs): # type: ignore
return self.reduce(
func, dim, numeric_only=numeric_only, allow_lazy=True, **kwargs
)
return self.reduce(func, dim, numeric_only=numeric_only, **kwargs)

return wrapped_func

Expand Down
84 changes: 57 additions & 27 deletions xarray/core/dataarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@
assert_coordinate_consistent,
remap_label_indexers,
)
from .dataset import Dataset, merge_indexes, split_indexes
from .dataset import Dataset, split_indexes
from .formatting import format_item
from .indexes import Indexes, copy_indexes, default_indexes
from .merge import PANDAS_TYPES, _extract_indexes_from_coords
Expand Down Expand Up @@ -249,14 +249,14 @@ class DataArray(AbstractArray, DataWithCoords):
Dictionary for holding arbitrary metadata.
"""

_accessors: Optional[Dict[str, Any]] # noqa
_cache: Dict[str, Any]
_coords: Dict[Any, Variable]
_indexes: Optional[Dict[Hashable, pd.Index]]
_name: Optional[Hashable]
_variable: Variable

__slots__ = (
"_accessors",
"_cache",
"_coords",
"_file_obj",
"_indexes",
Expand Down Expand Up @@ -376,7 +376,6 @@ def __init__(
assert isinstance(coords, dict)
self._coords = coords
self._name = name
self._accessors = None

# TODO(shoyer): document this argument, once it becomes part of the
# public interface.
Expand Down Expand Up @@ -772,7 +771,9 @@ def reset_coords(
return dataset

def __dask_tokenize__(self):
return (type(self), self._variable, self._coords, self._name)
from dask.base import normalize_token

return normalize_token((type(self), self._variable, self._coords, self._name))

def __dask_graph__(self):
return self._to_temp_dataset().__dask_graph__()
Expand Down Expand Up @@ -1617,10 +1618,10 @@ def set_index(
--------
DataArray.reset_index
"""
_check_inplace(inplace)
indexes = either_dict_or_kwargs(indexes, indexes_kwargs, "set_index")
coords, _ = merge_indexes(indexes, self._coords, set(), append=append)
return self._replace(coords=coords)
ds = self._to_temp_dataset().set_index(
indexes, append=append, inplace=inplace, **indexes_kwargs
)
return self._from_temp_dataset(ds)

def reset_index(
self,
Expand Down Expand Up @@ -1743,7 +1744,9 @@ def stack(
return self._from_temp_dataset(ds)

def unstack(
self, dim: Union[Hashable, Sequence[Hashable], None] = None
self,
dim: Union[Hashable, Sequence[Hashable], None] = None,
fill_value: Any = dtypes.NA,
) -> "DataArray":
"""
Unstack existing dimensions corresponding to MultiIndexes into
Expand All @@ -1756,6 +1759,7 @@ def unstack(
dim : hashable or sequence of hashable, optional
Dimension(s) over which to unstack. By default unstacks all
MultiIndexes.
fill_value: value to be filled. By default, np.nan
Returns
-------
Expand Down Expand Up @@ -1787,7 +1791,7 @@ def unstack(
--------
DataArray.stack
"""
ds = self._to_temp_dataset().unstack(dim)
ds = self._to_temp_dataset().unstack(dim, fill_value)
return self._from_temp_dataset(ds)

def to_unstacked_dataset(self, dim, level=0):
Expand Down Expand Up @@ -2034,44 +2038,69 @@ def fillna(self, value: Any) -> "DataArray":

def interpolate_na(
self,
dim=None,
dim: Hashable = None,
method: str = "linear",
limit: int = None,
use_coordinate: Union[bool, str] = True,
max_gap: Union[int, float, str, pd.Timedelta, np.timedelta64] = None,
**kwargs: Any,
) -> "DataArray":
"""Interpolate values according to different methods.
"""Fill in NaNs by interpolating according to different methods.
Parameters
----------
dim : str
Specifies the dimension along which to interpolate.
method : {'linear', 'nearest', 'zero', 'slinear', 'quadratic', 'cubic',
'polynomial', 'barycentric', 'krog', 'pchip',
'spline', 'akima'}, optional
method : str, optional
String indicating which method to use for interpolation:
- 'linear': linear interpolation (Default). Additional keyword
arguments are passed to ``numpy.interp``
- 'nearest', 'zero', 'slinear', 'quadratic', 'cubic',
'polynomial': are passed to ``scipy.interpolate.interp1d``. If
method=='polynomial', the ``order`` keyword argument must also be
arguments are passed to :py:func:`numpy.interp`
- 'nearest', 'zero', 'slinear', 'quadratic', 'cubic', 'polynomial':
are passed to :py:func:`scipy.interpolate.interp1d`. If
``method='polynomial'``, the ``order`` keyword argument must also be
provided.
- 'barycentric', 'krog', 'pchip', 'spline', and `akima`: use their
respective``scipy.interpolate`` classes.
use_coordinate : boolean or str, default True
- 'barycentric', 'krog', 'pchip', 'spline', 'akima': use their
respective :py:class:`scipy.interpolate` classes.
use_coordinate : bool, str, default True
Specifies which index to use as the x values in the interpolation
formulated as `y = f(x)`. If False, values are treated as if
eqaully-spaced along `dim`. If True, the IndexVariable `dim` is
used. If use_coordinate is a string, it specifies the name of a
eqaully-spaced along ``dim``. If True, the IndexVariable `dim` is
used. If ``use_coordinate`` is a string, it specifies the name of a
coordinate variariable to use as the index.
limit : int, default None
Maximum number of consecutive NaNs to fill. Must be greater than 0
or None for no limit.
or None for no limit. This filling is done regardless of the size of
the gap in the data. To only interpolate over gaps less than a given length,
see ``max_gap``.
max_gap: int, float, str, pandas.Timedelta, numpy.timedelta64, default None.
Maximum size of gap, a continuous sequence of NaNs, that will be filled.
Use None for no limit. When interpolating along a datetime64 dimension
and ``use_coordinate=True``, ``max_gap`` can be one of the following:
- a string that is valid input for pandas.to_timedelta
- a :py:class:`numpy.timedelta64` object
- a :py:class:`pandas.Timedelta` object
Otherwise, ``max_gap`` must be an int or a float. Use of ``max_gap`` with unlabeled
dimensions has not been implemented yet. Gap length is defined as the difference
between coordinate values at the first data point after a gap and the last value
before a gap. For gaps at the beginning (end), gap length is defined as the difference
between coordinate values at the first (last) valid data point and the first (last) NaN.
For example, consider::
<xarray.DataArray (x: 9)>
array([nan, nan, nan, 1., nan, nan, 4., nan, nan])
Coordinates:
* x (x) int64 0 1 2 3 4 5 6 7 8
The gap lengths are 3-0 = 3; 6-3 = 3; and 8-6 = 2 respectively
kwargs : dict, optional
parameters passed verbatim to the underlying interpolation function
Returns
-------
DataArray
interpolated: DataArray
Filled in DataArray.
See also
--------
Expand All @@ -2086,6 +2115,7 @@ def interpolate_na(
method=method,
limit=limit,
use_coordinate=use_coordinate,
max_gap=max_gap,
**kwargs,
)

Expand Down
Loading

0 comments on commit 747962d

Please sign in to comment.