nd-rolling #4219

fujiisoup · 2020-07-12T12:19:19Z

Closes Convolution operation #4196
Tests added
Passes isort -rc . && black . && mypy . && flake8
User visible changes (including notable bug fixes) are documented in whats-new.rst
New functions/methods are listed in api.rst

I noticed that the implementation of nd-rolling is straightforward.
The core part is implemented but I am wondering what the best API is, with keeping it backward-compatible.

Obviously, it is basically should look like

da.rolling(x=3, y=3).mean()

A problem is other parameters, centers and min_periods.
In principle, they can depend on dimension.
For example, we can have center=True only for x but not for y.

So, maybe we allow dictionary for them?

da.rolling(x=3, y=3, center={'x': True, 'y': False}, min_periods={'x': 1, 'y': None}).mean()

The same thing happens for .construct method.

da.rolling(x=3, y=3).construct(x='x_window', y='y_window', stride={'x': 2, 'y': 1})

I'm afraid if this dictionary argument was a bit too redundant.

Does anyone have another idea?

fujiisoup · 2020-07-12T12:20:41Z

xarray/tests/test_dataarray.py

+@pytest.mark.parametrize("center", (True, False))
+@pytest.mark.parametrize("min_periods", (None, 1))
+def test_ndrolling_reduce(da, center, min_periods):
+    rolling_obj = da.rolling(time=3, a=2, center=center, min_periods=min_periods)


This is a basic test for the new nd-rolling

max-sixty · 2020-07-12T18:20:38Z

This looks very promising; I'm surprised it was possible without more code. I'm being slow, but where is the nd-rolling algo? I had thought bottleneck didn't support more than one dimension? https://bottleneck.readthedocs.io/en/latest/bottleneck.move.html, and that we'd have to implement our own in numbagg (which would be very possible)

max-sixty · 2020-07-12T18:23:19Z

Re the API, I think the dict is probably the best option, although it does complicate as the arguments become differently typed depending on one vs multiple dimensions.

One alternative is to allow fluent args, like:

(
    da
    .rolling(x=3, center=True)
    .rolling(y=5, min_periods=2)
    .mean()
)

...but does that then seem like the second rolling is operating on the result of the first?

fujiisoup · 2020-07-12T20:09:34Z

Hi @max-sixty

One alternative is to allow fluent args, like:
...but does that then seem like the second rolling is operating on the result of the first?

I couldn't think of it until just now. But yes, it sounds to me like a repeated rolling operation.

I'm being slow, but where is the nd-rolling algo? I had thought bottleneck didn't support more than one dimension?

No. With nd-rolling, we need to use numpy reductions.
Its skipna=True operation is currently slow, but it can be improved replacing nan before the stride-trick.

fujiisoup · 2020-07-12T20:18:28Z

Another API concern. We now use min_periods, in which we implicitly assume one-dimension cases.

With nd-dimension, I think min_counts argument is more appropriate like bottleneck, which will limit the lower bound of the number of missing entries in the n-dimensional window.

Even if we leave it, we may disallow nd-argument of min_periods but keep it a scalar.

pep8speaks · 2020-07-12T21:02:06Z

Hello @fujiisoup! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-08-07 22:59:00 UTC

fujiisoup · 2020-07-12T22:35:57Z

xarray/core/dask_array_ops.py

+    if hasattr(axis, "__len__"):
+        for ax, win, cen in zip(axis, window, center):
+            a = rolling_window(a, ax, win, cen, fill_value)
+        return a


I hope it is true, though I didn't check it.
A suspicious part is da.concatenate, which I expect does not copy the original (strided-)array.

I noticed that ghosting breaks my expectation. I need to update the algo here.

dcherian · 2020-07-13T15:05:26Z

I think the dictionary is OK. We can allow scalars for center, stride etc. in which case the scalar applies to all dimensions. This would preserve backward compatibility.

scalar for min_counts also seems OK to me. We can update it if someone requests it.

fujiisoup · 2020-07-14T00:27:51Z

I think now it is ready for review, though I'm sure tests miss a lot of edge cases.
Maybe we can fix them if pointed out.

fujiisoup · 2020-07-14T00:49:38Z

A possible improvement will be nan-reduction methods for nd-rolling.
Currently, we just use numpy nan-reductions, which is memory consuming for strided arrays.

This issue can be solved by replacing nan by appropriate values and applying nonnan-reduction methods,
e.g.,

da.rolling(x=2, y=3).construct(x='xw', y='yw').sum(['xw', 'yw'])

should be the same with

da.rolling(x=2, y=3).construct(x='xw', y='yw', fill_value=0).sum(['xw', 'yw'], skipna=False)

and the latter is much more memory efficient.

I'd like to leave this improvement to future PR.

fujiisoup · 2020-07-14T20:44:12Z

I got an error for typechecking, only in CI but not in local, from the code that I didn't change.

fujiisoup · 2020-07-31T00:42:23Z

Could anyone kindly review this?

max-sixty

Looks really good, great implementation imo

max-sixty · 2020-07-31T01:20:39Z

xarray/core/common.py

@@ -802,7 +802,7 @@ def rolling(
            Minimum number of observations in window required to have a value
            (otherwise result is NA). The default, None, is equivalent to
            setting min_periods equal to the size of the window.
-        center : boolean, default False
+        center : boolean, or a mapping, default False


Types on the signature should change too (indeed, I'm surprised mypy doesn't fail)

max-sixty · 2020-07-31T01:34:30Z

xarray/core/rolling.py

-            for k in self._attributes
-            if getattr(self, k, None) is not None
+            for k in list(self.dim)
+            + list(self.window)


Isn't this a list of ints? Is it going to do getattr(da, 3)?

xarray/core/rolling.py

max-sixty · 2020-07-31T01:59:00Z

xarray/core/nputils.py

    a = np.pad(a, pads, mode="constant", constant_values=fill_value)
-    return _rolling_window(a, window, axis)
+    for ax, win in zip(axis, window):
+        a = _rolling_window(a, win, ax)


This is very clever! I spent a while trying to figure out how it works...

max-sixty · 2020-07-31T02:02:27Z

xarray/tests/test_dataset.py

+@pytest.mark.parametrize("ds", (1,), indirect=True)
+@pytest.mark.parametrize("center", (True, False))
+@pytest.mark.parametrize("min_periods", (None, 1))
+@pytest.mark.parametrize("name", ("sum", "mean", "max"))


Will this test will fail for something like std?
IIUC iteratively applying the function to the array for each dim only works for some functions.

Which is probably fine, no need to write more tests imo, though maybe worth making a note

In this case, many other functions should have the same result.
If there are some nans in random positions, the result should be different.
I added a few more tests and include function std, but I don't think it is necessary to test all nanops.

fujiisoup · 2020-07-31T22:25:25Z

Thanks @max-sixty for the review ;)
I'll work for the update in a few days.

max-sixty · 2020-08-07T15:55:45Z

xarray/tests/test_dataset.py

+    assert_allclose(actual, expected)
+    assert actual.dims == expected.dims
+
+    # Do it in the oposite order


Suggested change

# Do it in the oposite order

# Do it in the opposite order

max-sixty · 2020-08-07T16:13:28Z

I'm still a bit confused. How does the "roll over each dimension in turn" approach equal the "roll over both dimension together" approach with a function like std? Here's a proposed counter example:

import xarray as xr
import numpy as np

da = xr.DataArray(np.asarray([[0,10,0],[0,10,0], [0,10,0]]), dims=list('xy'))
print(da)

<xarray.DataArray (x: 3, y: 3)>
array([[ 0, 10,  0],
       [ 0, 10,  0],
       [ 0, 10,  0]])
Dimensions without coordinates: x, y

x_std = da.rolling(dict(x=2)).std()
print(x_std)

<xarray.DataArray (x: 3, y: 3)>
array([[nan, nan, nan],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])
Dimensions without coordinates: x, y

x_then_y_std = x_std.rolling(dict(y=2)).std()
print(x_then_y_std)

<xarray.DataArray (x: 3, y: 3)>
array([[nan, nan, nan],
       [nan,  0.,  0.],
       [nan,  0.,  0.]])
Dimensions without coordinates: x, y

combined_std = da.rolling(dict(x=2, y=2)).std()
print(combined_std)

<xarray.DataArray (x: 3, y: 3)>
array([[nan, nan, nan],
       [nan,  5.,  5.],
       [nan,  5.,  5.]])
Dimensions without coordinates: x, y

dcherian · 2020-08-07T17:26:06Z

I think it's more like

da.rolling(x=2).construct("xr").rolling(y=2).construct("yr").std(["xr", "yr"])

but that gives

array([[0., 5., 5.],
       [0., 5., 5.],
       [0., 5., 5.]])

fujiisoup · 2020-08-07T20:45:01Z

Thanks @max-sixty .

You are completely correct.
As the test pass, I was fooling myself.

The reason was that the dataset I was using for the test does not have time and x simultaneously.
So I was not testing the 2d-rolling but just 1d-rolling.

Fixed. Now it correctly fails for mean and std, but passes with max and sum

max-sixty · 2020-08-08T04:02:09Z

Great! Thanks @fujiisoup !

Ready to go from my POV

fujiisoup · 2020-08-08T04:18:08Z

@max-sixty
thanks for the review.
merged

@mathause

* 'master' of github.com:pydata/xarray: (260 commits) Increase support window of all dependencies (pydata#4296) Implement interp for interpolating between chunks of data (dask) (pydata#4155) Add @mathause to current core developers. (pydata#4335) install sphinx-autosummary-accessors from conda-forge (pydata#4332) Use sphinx-accessors-autosummary (pydata#4323) ndrolling fixes (pydata#4329) DOC: fix typo argmin -> argmax in DataArray.argmax docstring (pydata#4327) pin sphinx to 3.1(pydata#4326) nd-rolling (pydata#4219) Implicit dask import 4164 (pydata#4318) allow customizing the inline repr of a duck array (pydata#4248) silence the known docs CI issues (pydata#4316) enh: fixed pydata#4302 (pydata#4315) Remove all unused and warn-raising methods from AbstractDataStore (pydata#4310) Fix map_blocks example (pydata#4305) Fix docstring for missing_dims argument to isel methods (pydata#4298) Support for PyCharm remote deployment (pydata#4299) Update map_blocks and map_overlap docstrings (pydata#4303) Lazily load resource files (pydata#4297) warn about the removal of the ufuncs (pydata#4268) ...

@mathause

* upstream/master: (34 commits) Fix bug in computing means of cftime.datetime arrays (pydata#4344) fix some str accessor inconsistencies (pydata#4339) pin matplotlib in ci/requirements/doc.yml (pydata#4340) Clarify drop_vars return value. (pydata#4244) Support explicitly setting a dimension order with to_dataframe() (pydata#4333) Increase support window of all dependencies (pydata#4296) Implement interp for interpolating between chunks of data (dask) (pydata#4155) Add @mathause to current core developers. (pydata#4335) install sphinx-autosummary-accessors from conda-forge (pydata#4332) Use sphinx-accessors-autosummary (pydata#4323) ndrolling fixes (pydata#4329) DOC: fix typo argmin -> argmax in DataArray.argmax docstring (pydata#4327) pin sphinx to 3.1(pydata#4326) nd-rolling (pydata#4219) Implicit dask import 4164 (pydata#4318) allow customizing the inline repr of a duck array (pydata#4248) silence the known docs CI issues (pydata#4316) enh: fixed pydata#4302 (pydata#4315) Remove all unused and warn-raising methods from AbstractDataStore (pydata#4310) Fix map_blocks example (pydata#4305) ...

@mathause

* upstream/master: (40 commits) Fix bug in computing means of cftime.datetime arrays (pydata#4344) fix some str accessor inconsistencies (pydata#4339) pin matplotlib in ci/requirements/doc.yml (pydata#4340) Clarify drop_vars return value. (pydata#4244) Support explicitly setting a dimension order with to_dataframe() (pydata#4333) Increase support window of all dependencies (pydata#4296) Implement interp for interpolating between chunks of data (dask) (pydata#4155) Add @mathause to current core developers. (pydata#4335) install sphinx-autosummary-accessors from conda-forge (pydata#4332) Use sphinx-accessors-autosummary (pydata#4323) ndrolling fixes (pydata#4329) DOC: fix typo argmin -> argmax in DataArray.argmax docstring (pydata#4327) pin sphinx to 3.1(pydata#4326) nd-rolling (pydata#4219) Implicit dask import 4164 (pydata#4318) allow customizing the inline repr of a duck array (pydata#4248) silence the known docs CI issues (pydata#4316) enh: fixed pydata#4302 (pydata#4315) Remove all unused and warn-raising methods from AbstractDataStore (pydata#4310) Fix map_blocks example (pydata#4305) ...

fujiisoup added 3 commits July 12, 2020 20:59

nd-rolling

1dd6b35

remove unnecessary print

6285476

black

dc113dc

fujiisoup commented Jul 12, 2020

View reviewed changes

finding a bug...

9a45fc7

make tests for ndrolling pass

f74b2e8

fujiisoup added 3 commits July 13, 2020 06:55

make center and window_dim a dict

d4990d7

A cleanup.

531b0ec

Revert test_units

b6cf250

fujiisoup commented Jul 12, 2020

View reviewed changes

make test pass

425ed45

fujiisoup added 2 commits July 14, 2020 09:12

More tests.

54d84e5

more docs

f7c4911

fujiisoup marked this pull request as ready for review July 14, 2020 00:26

fujiisoup added 3 commits July 15, 2020 05:27

Merge branch 'master' into nd_rolling

2d82e68

mypy

2e8a76f

improve whatsnew

31243b1

fujiisoup added 3 commits July 15, 2020 05:44

Improve doc

22ba8b9

Merge branch 'master' into nd_rolling

620aca3

Merge branch 'master' into nd_rolling

7404266

max-sixty reviewed Jul 31, 2020

View reviewed changes

fujiisoup added 6 commits August 7, 2020 14:57

Support nd-rolling in dask correctly

4b4e64a

Merge branch 'master' into nd_rolling

43acf73

Cleanup according to max's comment

398638c

flake8

c4fd353

black

28816f6

stop using either_dict_or_kwargs

404e78f

max-sixty reviewed Aug 7, 2020

View reviewed changes

Better tests.

4cac857

fujiisoup added 3 commits August 8, 2020 05:49

typo

4bb7804

mypy

9703e11

typo2

f44dd5d

fujiisoup merged commit 1d3dee0 into pydata:master Aug 8, 2020

fujiisoup deleted the nd_rolling branch August 8, 2020 07:23

fujiisoup mentioned this pull request Aug 8, 2020

Optimize ndrolling nanreduce #4325

Open

keewis mentioned this pull request Aug 8, 2020

failing docs CI #4328

Closed

max-sixty mentioned this pull request Apr 17, 2022

Convolution operation #4196

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nd-rolling #4219

nd-rolling #4219

fujiisoup commented Jul 12, 2020 •

edited

Loading

fujiisoup Jul 12, 2020

max-sixty commented Jul 12, 2020

max-sixty commented Jul 12, 2020 •

edited

Loading

fujiisoup commented Jul 12, 2020

fujiisoup commented Jul 12, 2020

pep8speaks commented Jul 12, 2020 •

edited

Loading

fujiisoup Jul 12, 2020

fujiisoup Aug 1, 2020

dcherian commented Jul 13, 2020

fujiisoup commented Jul 14, 2020

fujiisoup commented Jul 14, 2020

fujiisoup commented Jul 14, 2020

fujiisoup commented Jul 31, 2020

max-sixty left a comment

max-sixty Jul 31, 2020

max-sixty Jul 31, 2020

max-sixty Jul 31, 2020

max-sixty Jul 31, 2020

fujiisoup Aug 7, 2020

fujiisoup commented Jul 31, 2020

max-sixty Aug 7, 2020

max-sixty commented Aug 7, 2020

dcherian commented Aug 7, 2020

fujiisoup commented Aug 7, 2020

max-sixty commented Aug 8, 2020

fujiisoup commented Aug 8, 2020

nd-rolling #4219

nd-rolling #4219

Conversation

fujiisoup commented Jul 12, 2020 • edited Loading

Choose a reason for hiding this comment

max-sixty commented Jul 12, 2020

max-sixty commented Jul 12, 2020 • edited Loading

fujiisoup commented Jul 12, 2020

fujiisoup commented Jul 12, 2020

pep8speaks commented Jul 12, 2020 • edited Loading

Comment last updated at 2020-08-07 22:59:00 UTC

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dcherian commented Jul 13, 2020

fujiisoup commented Jul 14, 2020

fujiisoup commented Jul 14, 2020

fujiisoup commented Jul 14, 2020

fujiisoup commented Jul 31, 2020

max-sixty left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fujiisoup commented Jul 31, 2020

Choose a reason for hiding this comment

max-sixty commented Aug 7, 2020

dcherian commented Aug 7, 2020

fujiisoup commented Aug 7, 2020

max-sixty commented Aug 8, 2020

fujiisoup commented Aug 8, 2020

fujiisoup commented Jul 12, 2020 •

edited

Loading

max-sixty commented Jul 12, 2020 •

edited

Loading

pep8speaks commented Jul 12, 2020 •

edited

Loading