Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Typo about "len(arr.dims) <= len(arr.coords) in general" #7929

Closed
4 tasks done
OrionSmedley opened this issue Jun 20, 2023 · 14 comments
Closed
4 tasks done

Typo about "len(arr.dims) <= len(arr.coords) in general" #7929

OrionSmedley opened this issue Jun 20, 2023 · 14 comments
Labels
plan to close May be closeable, needs more eyeballs topic-documentation

Comments

@OrionSmedley
Copy link

OrionSmedley commented Jun 20, 2023

What happened?

Its on the website https://docs.xarray.dev/en/stable/user-guide/terminology.html, but isn't true in general.

What did you expect to happen?

the following code should return true, if the website is correct.

Minimal Complete Verifiable Example

import xarray as xr
import numpy as np
data = xr.DataArray(np.random.randn(2, 3), dims=("x", "y"), coords={"x": [10, 20]})

print(len(data.dims ))
print(len(data.coords))

print( len(data.dims) <= len(data.coords) )

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

No response

Environment

@OrionSmedley OrionSmedley added bug needs triage Issue that has not been reviewed by xarray team member labels Jun 20, 2023
@welcome
Copy link

welcome bot commented Jun 20, 2023

Thanks for opening your first issue here at xarray! Be sure to follow the issue template!
If you have an idea for a solution, we would really welcome a Pull Request with proposed changes.
See the Contributing Guide for more.
It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better.
Thank you!

@headtr1ck
Copy link
Collaborator

Thanks for the issue, always happy to improve the documentation.

However, I don't really see how this implies that you cannot have more coordinates as dimensions. It is written that a coordinate labels a dimension or set of dimensions, so clearly you can have multiple coordinates per dimension.

If you consider dimension-coordinates only, your example is correct.

@headtr1ck headtr1ck added topic-documentation topic-faq and removed bug needs triage Issue that has not been reviewed by xarray team member labels Jun 20, 2023
@dcherian
Copy link
Contributor

@ionsme can you suggest an improvement to the text please? It really helps when someone who's not a maintainer edits these things.

@OrionSmedley
Copy link
Author

@dcherian Ok, I'll think about a good wording a bit later and do that.

@OrionSmedley
Copy link
Author

OrionSmedley commented Jun 21, 2023

@headtr1ck I agree that you can have multiple coordinates per dimension.

However, the documentation explicitly says that you can't have it the other way around.
That is the documentation currently says len(data.dims) <= len(data.coords) is always true.
This code snip bit I provided is a counter example; it explicitly returns false.

I also agree that "if you consider dimension-coordinates only" is correct. In that case, the code given in the documentation should be changed to len(data.dims) <= len(data.dimension-coords) (where data.dimension-coords is a stand in, because I don't know the correct expression to put)

@headtr1ck
Copy link
Collaborator

Oh wow, I completely missed the part where it says:

As a consequence, len(arr.dims) <= len(arr.coords) in general.

This is simply wrong, you are right.
You can have any number of coordinates, from none to many.

I don't think there is a direct way to retrieve dimension coordinates only.

Maybe we should retire the concept of dimension coordinate and purely talk about indexes. At the end it is just a 1D index with the same name as the dimension. I think there are still special use cases for them, like automatic tics in plotting.
@dcherian what do you think? Maybe this should be discussed in the call.

@dcherian
Copy link
Contributor

dcherian commented Jun 21, 2023

concept of dimension coordinate and purely talk about indexes. At the end it is just a 1D index with the same name as the dimension.

Possibly. I think we'll have to reword this section significantly to talk about indexes (xref #6975)

cc @benbovy

@OrionSmedley
Copy link
Author

Maybe we should retire the concept of dimension coordinate and purely talk about indexes. At the end it is just a 1D index with the same name as the dimension. I

Would it be possible to just make all coordinates dimension coordinates?
Because it seems to me that "non dimension coordinates" are just coordinates that have been unnecessarily demoted.

At least, from the user interface point of view.

By demoted I mean that you can't use it for things like selection, unless you use the workaround
ds.sel(MyDimCoord = ds.MyNonDimCoord == 'something')
rather than simply using
ds.sel(MyNonDimCoord =="something")

Also In terms of the graphing you mentioned, could just have the user select which coordinates to plot.

Under the hood,

I was thinking maybe there was some reason for this demoting. Naively though, I feel it should be possible to keep all coordinates as dimension coordinates. Maybe not the most efficient, but just to say it's possible, one could implement it as follows:
Make a pandas table that had the currently existing "dimension coordinate" indexed with the "non-dimension coordinates"
Then, if the user tries to do ds.sel(MyNonDimCoord =="something"), you find the "dimension" coordinate in the pandas table. And then run the ordinary ds.sel(MyDimCoord = "thingCorrespondingToSomething").

@headtr1ck
Copy link
Collaborator

It's not really possible because dimension coordinates have to be 1D, whereas coordinates in general can have any shape.

Also, there are still some special usecases for them, i.e. automatic axis naming in plots, which only works if you have a 1D coordinate and a 1-to-1 relation between coordinate and dimension, aka. a dimension-coordinate.

Indexing (so using da.sel(...)) only works for dimension-coordinates or special indexes (If I remember correctly, there was a PR that would allow indexing using any 1D coordinate, because this is basically the only requirement for simple value based indexing).

@benbovy
Copy link
Member

benbovy commented Jun 23, 2023

@ionsme since a few releases it is possible to set a pandas index for non-dimension 1-d coordinates for using it with .sel(), alignment, etc.

ds2 = ds.set_xindex("MyNonDimCoord")
ds2.sel(MyNonDimCoord="something")

It is just not properly documented yet. I really need to finish #6975.

@dcherian
Copy link
Contributor

Yes it seems like #6975 should close this issue.

@OrionSmedley
Copy link
Author

@benbovy Is there a reason that isn't done by default? So that the user never has to care about whether something is also an index?

@benbovy
Copy link
Member

benbovy commented Jun 26, 2023

If we do that by default this would likely introduce breaking changes as the non-dimension 1-d coordinates would also become automatically aligned by default.

Another reason is that building a (pandas) index is not free, although quite cheap most of the time.

Some people asked the opposite, i.e., if it is possible in Xarray to not build any index by default (or choose another, non-pandas default index type) and/or deactivate automatic alignment. I guess we could expose some global options to control what should be the default behavior?

Users seem to actually care about (the absence of) indexes, which is why they have been promoted as 1st-class citizens in the Xarray data model.

@max-sixty
Copy link
Collaborator

Can we close this now?

@max-sixty max-sixty added the plan to close May be closeable, needs more eyeballs label Sep 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plan to close May be closeable, needs more eyeballs topic-documentation
Projects
None yet
Development

No branches or pull requests

5 participants