xarray.map #4484

kefirbandi · 2020-10-04T04:11:29Z

UPDATE:
Please let me know whether this PR can be considered to be merged.
If not I won't bother trying to fix failed tests such as:

would reformat /home/vsts/work/1/s/xarray/core/map.py
would reformat /home/vsts/work/1/s/xarray/core/dataset.py
would reformat /home/vsts/work/1/s/xarray/tests/test_map.py
Oh no! 💥 💔 💥

Thanks

Tests added
User visible changes (including notable bug fixes) are documented in whats-new.rst
New functions/methods are listed in api.rst

I implemented the top-level xarray.map function.
It is a generalization of the Dataset.map method for those functions which take more than one DataArray as input.
The function will be applied to the intersection of the variables in the datasets.

E.g:

f = lambda a, b: b-a
map([ds1, ds2], f)

More details in the docstring.

(I probably messed up something with git as the commits listed below also include my earlier commits not related to this PR. But the list of files is clean, it only includes what I'd like to merge)

Removing additional argument of Dataset._replace

# Conflicts: # xarray/tests/test_dataset.py

…xarray.Dataset.map

Adding See Also entries to docs Listing the new function in whats-new.rst

pep8speaks · 2020-10-04T04:11:38Z

Hello @kefirbandi! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-10-04 04:43:38 UTC

What about changing to alphabetical order?

max-sixty · 2020-10-06T00:37:46Z

Hi @kefirbandi , thanks for the PR!

Could I ask what the common use cases for this would be? If I understand correctly, running map(x, y, lambda x: x + y) is equivalent to x + y.

shoyer · 2020-10-06T05:13:29Z

If we're doing to do this, I would suggest that the right signature is xarray.map(func, *datasets, **optional_kwargs), matching Python's builtin map.

shoyer · 2020-10-06T05:18:12Z

xarray/core/map.py

+        shared_variable_names = set.intersection(*(set(ds.data_vars) for ds in datasets))
+        for k in shared_variable_names:
+            data_arrays = [d[k] for d in datasets]
+            v = maybe_wrap_array(datasets[0][k], func(*(data_arrays + list(args)), **kwargs))


I would prefer not use maybe_wrap_array in new functions. It exists in map only for legacy reasons, but we'd really like to put that sort of functionality into a separate dedicated function (e.g., see the proposal for apply_raw in #1618).

No problem for me. I copied the existing behavior without actually really paying attention as to what it is doing an why.

shoyer · 2020-10-06T05:19:44Z

I think this could make sense as a generalization of Dataset.map. I would indeed be interested to hear more about the specific use cases that motivated this, though.

kefirbandi · 2020-10-06T05:32:04Z

Could I ask what the common use cases for this would be? If I understand correctly, running map(x, y, lambda x: x + y) is equivalent to x + y.

The motivating use case was that I wanted to compute the dot-product of two DataSets (=all of their matching variables).
But in general any other function which is not as simple as x + y could be used here.

max-sixty · 2020-10-07T01:05:37Z

The motivating use case was that I wanted to compute the dot-product of two DataSets (=all of their matching variables).

Thanks, that's a good case.

A couple of thoughts:

Are there many other cases outside of xr.dot which only operate on DataArrays? If not, we could update that function to take a Dataset
Along those lines, I wonder whether this becomes an underlying function — functions which currently operate only on DataArrays could have Dataset inputs run through this.
NB: In many of the computation functions, the logic runs the other way (we define the DataArray function to be the Dataset operation on a single-variable Dataset); having a more consistent way of doing this could make extending functions between DataArrays & Datasets easier.
Maybe jumping ahead — are there functions where the result of func(ds1, ds2) shouldn't be that function mapped over the matching variables?

dcherian · 2020-10-07T01:59:59Z

Are there many other cases outside of xr.dot which only operate on DataArrays? If not, we could update that function to take a Dataset

I think we should do this.

kefirbandi · 2020-10-07T16:56:04Z

* Are there many other cases outside of `xr.dot` which only operate on `DataArray`s? If not, we could update that function to take a `Dataset`

I think it would be a good idea to extend dot to Datasets. However a user may wish to map a custom DataArray function to Dataset.

* Maybe jumping ahead — are there functions where the result of `func(ds1, ds2)` shouldn't be that function mapped over the matching variables?

Not sure of the context of this. In the most general case one can certainly implement any function on ds1 and ds2. Or are you referring to the built-ins such as .dot?

shoyer · 2020-10-07T17:17:48Z

I think some version of xarray.map could indeed be pretty useful more generally. In particular, it could implement each of the ways to "align" variables between different datasets, e.g., should we use the intersection of the variables, or the union, inserting dummy variables with fill values? The dataset_join argument to xarray.apply_ufunc controls this behavior, but it would be nice to have a self-contained version of this.

One challenge for using this internally in xarray (vs. implementing things only on Dataset objects) is that the coordinates on the DataArray objects inside a Dataset can be redundant, so mapping over DataArrays may be less efficient, due to duplicate work on coordinates. In contrast, once everything is in Dataset objects, all the variables/coordinates are pre-aligned and de-deduplicated.

kefirbandi · 2020-10-12T07:36:34Z

If we're doing to do this, I would suggest that the right signature is xarray.map(func, *datasets, **optional_kwargs), matching Python's builtin map.

What I'd like to ensure is a clean separation between the arguments of xarray.map and func. Map has three "own" parameters, like func, datasets and keep_attrs. By using the **kwargs approach we are excluding these parameter names from func. Not saying that is likely that anyone would apply a function with such parameter names. But not impossible either.
Also having a real dict for keyword args (and maybe a list for positional arguments of func) is more explicit.

In my implementation the order of parameters is datasets and then func to match that of Dataset.map with the implicit self.
But probably func and then the datasets is more intuitive.

max-sixty · 2023-09-30T19:02:06Z

It's not great for us as reviewers that we let this sit idle — sorry @kefirbandi ....

But reviewing it now, I would vote to have func(ds1, ds2) work for all our functions, rather than have a new top-level method xr.map. xr.apply_ufunc will do that for us IIUC — i.e. it can operate as an xr.map internally.

If others agree, we could close this, and potentially open an issue for any methods which don't have that interface (which is maybe only xr.dot?)

kefirbandi · 2023-09-30T20:41:55Z

I'm ok with that.

kefirbandi and others added 11 commits March 22, 2019 10:10

Fixing deepcopy for Dataset.attrs + adjusting testcase

8000575

removing extra newlines

73a2cab

Updating whats-new

6926f5b

PEP8

5114beb

Adding attrs-test to the shallow copy testcase

81aae2e

Removing additional argument of Dataset._replace

Rename _attrs -> attrs

68455ca

Merge branch 'master' into master

7c983e0

Merge remote-tracking branch 'upstream/master'

0dabcac

# Conflicts: # xarray/tests/test_dataset.py

Adding the xarray.map function as a multi-variable generalization of …

2e711b3

…xarray.Dataset.map

Some formatting changes

5d17336

Adding See Also entries to docs Listing the new function in whats-new.rst

Docs fixes

ed286b8

kefirbandi added 3 commits October 4, 2020 06:32

Pep8, doctest

53b96f4

attempt to fix imports in __init__.py

8d59089

What about changing to alphabetical order?

fixing doctest

d99b0d8

shoyer reviewed Oct 6, 2020

View reviewed changes

max-sixty closed this Oct 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xarray.map #4484

xarray.map #4484

kefirbandi commented Oct 4, 2020 •

edited

Loading

pep8speaks commented Oct 4, 2020 •

edited

Loading

max-sixty commented Oct 6, 2020

shoyer commented Oct 6, 2020

shoyer Oct 6, 2020

kefirbandi Oct 6, 2020

shoyer commented Oct 6, 2020

kefirbandi commented Oct 6, 2020

max-sixty commented Oct 7, 2020

dcherian commented Oct 7, 2020

kefirbandi commented Oct 7, 2020

shoyer commented Oct 7, 2020

kefirbandi commented Oct 12, 2020

max-sixty commented Sep 30, 2023

kefirbandi commented Sep 30, 2023

xarray.map #4484

xarray.map #4484

Conversation

kefirbandi commented Oct 4, 2020 • edited Loading

pep8speaks commented Oct 4, 2020 • edited Loading

Comment last updated at 2020-10-04 04:43:38 UTC

max-sixty commented Oct 6, 2020

shoyer commented Oct 6, 2020

shoyer Oct 6, 2020

Choose a reason for hiding this comment

kefirbandi Oct 6, 2020

Choose a reason for hiding this comment

shoyer commented Oct 6, 2020

kefirbandi commented Oct 6, 2020

max-sixty commented Oct 7, 2020

dcherian commented Oct 7, 2020

kefirbandi commented Oct 7, 2020

shoyer commented Oct 7, 2020

kefirbandi commented Oct 12, 2020

max-sixty commented Sep 30, 2023

kefirbandi commented Sep 30, 2023

kefirbandi commented Oct 4, 2020 •

edited

Loading

pep8speaks commented Oct 4, 2020 •

edited

Loading