More annotations #3177

crusaderky · 2019-08-02T14:49:50Z

No description provided.

crusaderky · 2019-08-02T14:51:55Z

xarray/core/dataset.py

            return self._drop_vars(labels, errors=errors)
        else:
+            if utils.is_scalar(labels):


I don't think I fully understand the use case where dims is not None, where apparently you are allowed to feed into labels a whole DataArray. It also makes me nervous that there's a single unit test, not together with the other tests for Dataset.drop, that triggers the use case.

Isn't this for the case you pass a single label; what's the case with passing a whole DataArray?

In [12]: da = xr.DataArray(data=np.ones((2, 3)), dims=['x', 'y'], coords=dict(x=list('ab'))) In [13]: da Out[13]: <xarray.DataArray (x: 2, y: 3)> array([[1., 1., 1.], [1., 1., 1.]]) Coordinates: * x (x) <U1 'a' 'b' Dimensions without coordinates: y In [14]: da.drop('a', dim='x') Out[14]: <xarray.DataArray (x: 1, y: 3)> array([[1., 1., 1.]]) Coordinates: * x (x) <U1 'b' Dimensions without coordinates: y

This is the problem. The test is expecting ds.y to be cast to a generic array-like by drop.

xarray/xarray/tests/test_dataset.py

Line 4245 in dba85bd

expected = 2 * ds.drop(ds.y, dim='y')

The fact is that iter(DataArray) yields scalar DataArrays, not the elements of the DataArray (unlike iter(numpy.ndarray))

So the lines

if isinstance(labels, str) or not isinstance(labels, Iterable): labels = {labels} else: labels = set(labels)

fall apart when labels is a DataArray.
To make it work I should change it to something like:

if isinstance(labels, str) or not isinstance(labels, Iterable): labels = {labels} elif isinstance(labels, DataArray): labels = labels.values else: labels = set(labels)

The problem is that the logic above is EVERYWHERE. None of the many, many functions that expect "one or more dimension names" as an input cope (or are tested against) a DataArray input.

I lean towards removing the option entirely, and changing the unit test from

ds.drop(ds.y, dim='y')

to

ds.drop(ds.y.values, dim='y')

This is also because passing a DataArray is incredibly counter-intuitive; it took me a long time to figure out that ds.drop(ds.y, dim='y') is not trying to drop the y variable.

Note that I wrote labels = labels.values and not labels = set(labels.values). That would cause a huge performance hit when values has many elements.

I think the problem is due to the fact that the same method drop() performs two fundamentally different things - drop variables or elements from the variables - and in the latter case there are plenty of similar methods that DO accept a DataArray, e.g. sel(), where(), etc.

I think I cracked it - please review

Yes, it would be ideal to have separate functions for dropping variables rather than elements along an array dimension.

We are also considering refactoring this function over in #3128, I would be very interested in your thoughts!

dcherian · 2019-08-02T14:52:11Z

Like a machine!

max-sixty · 2019-08-02T15:41:17Z

Like a machine!

Living up to the headshot; a rampage of typing...

Thanks @crusaderky !

pep8speaks · 2019-08-02T16:30:29Z

Hello @crusaderky! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-08-05 22:11:48 UTC

crusaderky · 2019-08-02T18:57:50Z

Ready for review and merge

max-sixty

Great! Thanks for being so thorough.

I put added some small questions, e.g. around is_scalar. I also added a couple of comments we can elide for now.

Assuming we're good on is_scalar, I'll go ahead and merge.

max-sixty · 2019-08-02T19:48:02Z

xarray/core/dataset.py

+            # Don't cast to set, as it would harm performance when labels
+            # is a large numpy array
+            if utils.is_scalar(labels):
+                labels = [labels]


Minor, no need to change now: could we remove the identical operation from the DataArray wrapper of this at https://github.com/pydata/xarray/pull/3177/files#diff-ffd3597671590bab245b3193affa62b8R1798 ?

max-sixty · 2019-08-02T19:50:04Z

xarray/core/dataset.py

@@ -3338,8 +3385,10 @@ def drop_dims(self, drop_dims, *, errors='raise'):
        if errors not in ['raise', 'ignore']:
            raise ValueError('errors must be either "raise" or "ignore"')

-        if utils.is_scalar(drop_dims):
+        if isinstance(drop_dims, str) or not isinstance(drop_dims, Iterable):


What's the difference between this and is_scalar? I had thought is_scalar was attempting to centralize this logic in a single function

is_scalar doesn't cope with non-string Hashables, e.g. enum.Enum. I'm uncomfortable touching it without going through the exercise of thoroughly reviewing everything that invokes it.

Actually, rereading it, it does cope with Enum. However, most functions that today use if isinstance(drop_dims, str) or not isinstance(drop_dims, Iterable): won't transparently cope with 0-dimensional numpy arrays or, even worse, 0-dimensional DataArrays.

max-sixty · 2019-08-02T19:55:13Z

xarray/core/dataset.py

+        ...
+
+    # Drop index labels along dimension
+    @overload  # noqa: F811


Great! You're leading our understanding of python typing here...

max-sixty · 2019-08-02T19:57:04Z

xarray/core/dataset.py

@@ -3351,15 +3400,15 @@ def drop_dims(self, drop_dims, *, errors='raise'):
                        for d in v.dims if d in drop_dims)
        return self._drop_vars(drop_vars)

-    def transpose(self, *dims):
+    def transpose(self, *dims: Hashable) -> 'Dataset':


One day (don't change now), we could have this as returning its class, for those who inherit from Dataset. And I'm not sure how to do that with python typing; in other languages it's -> Self

You can do this in Python by defining a type variable, T = TypeVar('T') and using it to annotate self:

def transpose(self: T, *dims: Hashable) -> T:

It is possible, yes, but IMHO detracts from readibility. The question here is, do people actually subclass from DataArray and Dataset? I've always found xarray rather hostile to subclassing, chiefly due to the heavy interaction between DataArray and Dataset.

max-sixty · 2019-08-02T19:59:29Z

xarray/core/dataset.py

            dims = set(self.dims)
+        elif isinstance(dim, str) or not isinstance(dim, Iterable):


Same here re this vs is_scalar

max-sixty · 2019-08-02T20:03:14Z

xarray/tests/test_dataarray.py

@@ -1904,9 +1904,9 @@ def test_drop_coordinates(self):
        assert_identical(actual, expected)

        with raises_regex(ValueError, 'cannot be found'):
-            arr.drop(None)
+            arr.drop('w')


What happens if None is passed? My quick test suggest this test should still pass, is that right?

None technically works, but it's not a valid use case. hash(None) works and isinstance(None, Hashable) returns True, but mypy will complain if you pass None where a Hashable is required, as None is treated as a special case. Also, xarray uses None everywhere to mark optional parameters, and in this case it asks the user to explicitly provide a value.

max-sixty · 2019-08-02T20:07:05Z

xarray/tests/test_dataset.py

@@ -2000,6 +2000,12 @@ def test_drop_index_labels(self):
        expected = data.isel(x=slice(0, 0))
        assert_identical(expected, actual)

+        # DataArrays as labels are a nasty corner case as they are not
+        # Iterable[Hashable] - DataArray.__iter__ yields scalar DataArrays.
+        actual = data.drop(DataArray(['a', 'b', 'c']), 'x', errors='ignore')


Great, it's good this now works

crusaderky · 2019-08-05T22:28:39Z

Ready for merge if everybody's happy with the current state

max-sixty · 2019-08-06T01:19:32Z

OK, let's merge and we can iterate if needed from here. Thanks as ever @crusaderky

@shoyer - do you have any thoughts on is_scalar from this thread?

* master: pyupgrade one-off run (pydata#3190) mfdataset, concat now support the 'join' kwarg. (pydata#3102) reduce the size of example dataset in dask docs (pydata#3187) add climpred to related-projects (pydata#3188) bump rasterio to 1.0.24 in doc building environment (pydata#3186) More annotations (pydata#3177) Support for __array_function__ implementers (sparse arrays) [WIP] (pydata#3117) Internal clean-up of isnull() to avoid relying on pandas (pydata#3132) Call darray.compute() in plot() (pydata#3183) BUG: fix + test open_mfdataset fails on variable attributes with list… (pydata#3181)

Annotations for Dataset.drop et al

74b4190

crusaderky commented Aug 2, 2019

View reviewed changes

Guido Imperiale added 2 commits August 2, 2019 16:49

Annotations for Dataset.interpolate et al

3eaa6b2

Dataset.drop(DataArray)

d1fc70c

crusaderky changed the title ~~WIP: More annotations~~ More annotations Aug 2, 2019

Guido Imperiale added 5 commits August 2, 2019 17:31

flake8

66cf592

Merge remote-tracking branch 'upstream/master' into annotations

0c9f1a9

trivial

197ddc2

@overload Dataset.drop

355c390

docstring tweaks

fdc205b

max-sixty approved these changes Aug 2, 2019

View reviewed changes

Clean up redundant code

816e13c

max-sixty merged commit d1935ff into pydata:master Aug 6, 2019

crusaderky deleted the annotations branch August 9, 2019 08:50

gwgundersen mentioned this pull request Aug 15, 2019

Support keyword API for Dataset.drop #3128

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More annotations #3177

More annotations #3177

crusaderky commented Aug 2, 2019

crusaderky Aug 2, 2019

max-sixty Aug 2, 2019

crusaderky Aug 2, 2019 •

edited

Loading

crusaderky Aug 2, 2019 •

edited

Loading

crusaderky Aug 2, 2019

crusaderky Aug 2, 2019

crusaderky Aug 2, 2019

shoyer Aug 4, 2019

dcherian commented Aug 2, 2019

max-sixty commented Aug 2, 2019

pep8speaks commented Aug 2, 2019 •

edited

Loading

crusaderky commented Aug 2, 2019

max-sixty left a comment

max-sixty Aug 2, 2019

crusaderky Aug 5, 2019

max-sixty Aug 2, 2019

crusaderky Aug 5, 2019

crusaderky Aug 5, 2019

max-sixty Aug 2, 2019

max-sixty Aug 2, 2019

shoyer Aug 4, 2019

crusaderky Aug 5, 2019

max-sixty Aug 2, 2019

max-sixty Aug 2, 2019

crusaderky Aug 5, 2019 •

edited

Loading

max-sixty Aug 2, 2019

crusaderky commented Aug 5, 2019

max-sixty commented Aug 6, 2019

		dims = set(self.dims)
		elif isinstance(dim, str) or not isinstance(dim, Iterable):

More annotations #3177

More annotations #3177

Conversation

crusaderky commented Aug 2, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

crusaderky Aug 2, 2019 • edited Loading

Choose a reason for hiding this comment

crusaderky Aug 2, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dcherian commented Aug 2, 2019

max-sixty commented Aug 2, 2019

pep8speaks commented Aug 2, 2019 • edited Loading

Comment last updated at 2019-08-05 22:11:48 UTC

crusaderky commented Aug 2, 2019

max-sixty left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

crusaderky Aug 5, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

crusaderky commented Aug 5, 2019

max-sixty commented Aug 6, 2019

crusaderky Aug 2, 2019 •

edited

Loading

crusaderky Aug 2, 2019 •

edited

Loading

pep8speaks commented Aug 2, 2019 •

edited

Loading

crusaderky Aug 5, 2019 •

edited

Loading