Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert Variable to IndexVariable or vice versa when renamed #4108

Closed
wants to merge 20 commits into from

Conversation

dcherian
Copy link
Contributor

@dcherian dcherian commented May 28, 2020

xarray/core/dataset.py Outdated Show resolved Hide resolved
@dcherian dcherian changed the title Convert Variable to IndexVariable when renamed to existing dim name Convert Variable to IndexVariable or vice versa during renaming Jan 17, 2021
@dcherian dcherian changed the title Convert Variable to IndexVariable or vice versa during renaming Convert Variable to IndexVariable or vice versa when renamed Jan 17, 2021
@dcherian dcherian marked this pull request as ready for review January 17, 2021 01:41
@dcherian dcherian requested a review from shoyer January 17, 2021 01:43
@keewis
Copy link
Collaborator

keewis commented Jan 18, 2021

I wonder if it would be better to first "reorganize" all of the existing functions: we currently have rename (and Dataset.rename_dims / Dataset.rename_vars), set_coords, reset_coords, set_index, reset_index and swap_dims, which overlap partially. For example, the code sample from #4417 works if instead of

ds = ds.rename(b='x')
ds = ds.set_coords('x')

we use

ds = ds.set_index(x="b")

and something similar for the code sample in #4107.

I believe we currently have these use cases (not sure if that list is complete, though):

  • rename a DataArrayrename
  • rename a existing variable to a name that is not yet in the object → rename / Dataset.rename_vars / Dataset.rename_dirs
  • convert a data variable to a coordinate (not a dimension coordinate) → set_coords
  • convert a coordinate (not a dimension coordinate) to a data variable → reset_coords
  • swap a existing dimension coordinate with a coordinate (which may not exist) and rename the dimension → swap_dims
  • use a existing coordinate / data variable as a dimension coordinate (do not rename the dimension) → set_index
  • stop using a coordinate as dimension coordinate and append _ to its name (do not rename the dimension) → reset_index
  • use two existing coordinates / data variables as a MultiIndex → set_index
  • stop using a MultiIndex as a dimension coordinate and use its levels as coordinates → reset_index

Sometimes, some of these can be emulated by combinations of others, for example:

# x is a dimension without coordinates
assert_identical(ds.set_index({"x": "b"}), ds.swap_dims({"x": "b"}).rename({"b": "x"}))
assert_identical(ds.swap_dims({"x": "b"}), ds.set_index({"x": "b"}).rename({"x": "b"}))

and, with this PR:

assert_identical(ds.set_index({"x": "b"}), ds.set_coords("b").rename({"b": "x"}))
assert_identical(ds.swap_dims({"x": "b"}), ds.rename({"b": "x"}))

which means that it would increase the overlap of rename, set_index, and swap_dims.

In any case I think we should add a guide which explains which method to pick in which situation (or extend howdoi).

@dcherian
Copy link
Contributor Author

dcherian commented Jan 19, 2021

I wonder if it would be better to first "reorganize" all of the existing functions

I fully agree. However the bug in #4107 is that an invalid xarray object is created (_assert_internal_invariants fails), so I think we should merge a solution even before cleaning up the API.

EDIT: @keewis shall we copy your comment to a new issue?

* upstream/master: (24 commits)
  Compatibility with dask 2021.02.0 (pydata#4884)
  Ensure maximum accuracy when encoding and decoding cftime.datetime values (pydata#4758)
  Fix `bounds_error=True` ignored with 1D interpolation (pydata#4855)
  add a drop_conflicts strategy for merging attrs (pydata#4827)
  update pre-commit hooks (mypy) (pydata#4883)
  ensure warnings cannot become errors in assert_ (pydata#4864)
  update pre-commit hooks (pydata#4874)
  small fixes for the docstrings of swap_dims and integrate (pydata#4867)
  Modify _encode_datetime_with_cftime for compatibility with cftime > 1.4.0 (pydata#4871)
  vélin (pydata#4872)
  don't skip the doctests CI (pydata#4869)
  fix da.pad example for numpy 1.20 (pydata#4865)
  temporarily pin dask (pydata#4873)
  Add units if "unit" is in the attrs. (pydata#4850)
  speed up the repr for big MultiIndex objects (pydata#4846)
  dim -> coord in DataArray.integrate (pydata#3993)
  WIP: backend interface, now it uses subclassing  (pydata#4836)
  weighted: small improvements (pydata#4818)
  Update related-projects.rst (pydata#4844)
  iris update doc url (pydata#4845)
  ...
dcherian and others added 6 commits February 12, 2021 07:35
* upstream/master: (46 commits)
  pin netCDF4=1.5.3 in min-all-deps (pydata#4982)
  fix matplotlib errors for single level discrete colormaps (pydata#4256)
  Adapt exception handling in CFTimeIndex.__sub__ and __rsub__ (pydata#5006)
  Update options.py (pydata#5000)
  Adjust tests to use updated pandas syntax for offsets (pydata#4537)
  add a combine_attrs parameter to Dataset.merge (pydata#4895)
  Support for dask.graph_manipulation (pydata#4965)
  raise on passing axis to Dataset.reduce methods (pydata#4940)
  Whatsnew for 0.17.1 (pydata#4963)
  Refinements to how-to-release (pydata#4964)
  DOC: add example for reindex (pydata#4956)
  DOC: rm np import (pydata#4949)
  Add 0.17.0 release notes (pydata#4953)
  document update as inplace (pydata#4932)
  bump the dependencies (pydata#4942)
  Upstream CI: limit runtime (pydata#4946)
  typing for numpy 1.20 (pydata#4878)
  Use definition of DTypeLike from Numpy if found (pydata#4941)
  autoupdate mypy (pydata#4943)
  Add DataArrayCoarsen.reduce and DatasetCoarsen.reduce methods (pydata#4939)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants