-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xarray.map #4484
xarray.map #4484
Conversation
Removing additional argument of Dataset._replace
# Conflicts: # xarray/tests/test_dataset.py
…xarray.Dataset.map
Adding See Also entries to docs Listing the new function in whats-new.rst
Hello @kefirbandi! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-10-04 04:43:38 UTC |
What about changing to alphabetical order?
Hi @kefirbandi , thanks for the PR! Could I ask what the common use cases for this would be? If I understand correctly, running |
If we're doing to do this, I would suggest that the right signature is |
shared_variable_names = set.intersection(*(set(ds.data_vars) for ds in datasets)) | ||
for k in shared_variable_names: | ||
data_arrays = [d[k] for d in datasets] | ||
v = maybe_wrap_array(datasets[0][k], func(*(data_arrays + list(args)), **kwargs)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer not use maybe_wrap_array
in new functions. It exists in map
only for legacy reasons, but we'd really like to put that sort of functionality into a separate dedicated function (e.g., see the proposal for apply_raw
in #1618).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem for me. I copied the existing behavior without actually really paying attention as to what it is doing an why.
I think this could make sense as a generalization of |
The motivating use case was that I wanted to compute the dot-product of two DataSets (=all of their matching variables). |
Thanks, that's a good case. A couple of thoughts:
|
I think we should do this. |
I think it would be a good idea to extend dot to Datasets. However a user may wish to map a custom DataArray function to Dataset.
Not sure of the context of this. In the most general case one can certainly implement any function on ds1 and ds2. Or are you referring to the built-ins such as .dot? |
I think some version of One challenge for using this internally in xarray (vs. implementing things only on Dataset objects) is that the coordinates on the DataArray objects inside a Dataset can be redundant, so mapping over DataArrays may be less efficient, due to duplicate work on coordinates. In contrast, once everything is in Dataset objects, all the variables/coordinates are pre-aligned and de-deduplicated. |
What I'd like to ensure is a clean separation between the arguments of In my implementation the order of parameters is |
It's not great for us as reviewers that we let this sit idle — sorry @kefirbandi .... But reviewing it now, I would vote to have If others agree, we could close this, and potentially open an issue for any methods which don't have that interface (which is maybe only |
I'm ok with that. |
UPDATE:
Please let me know whether this PR can be considered to be merged.
If not I won't bother trying to fix failed tests such as:
Thanks
whats-new.rst
api.rst
I implemented the top-level xarray.map function.
It is a generalization of the Dataset.map method for those functions which take more than one DataArray as input.
The function will be applied to the intersection of the variables in the datasets.
E.g:
More details in the docstring.
(I probably messed up something with git as the commits listed below also include my earlier commits not related to this PR. But the list of files is clean, it only includes what I'd like to merge)