-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Iterating over a Dataset iterates only over its data_vars #884
Comments
An early version of xarray actually worked exactly like this, but we switched it to the current functionality. Let me see if I can dig up the relevant issues.... |
Here is one place where I discussed this with myself! Then I merged the functionality you are describing, but at some point switched it back.... |
What would happen to the actual variables? Would |
Ha! What are your thoughts now? |
There is no way we would change this --
I like this change for the reasons you outline above, and those I mentioned in the previous issue. I'm somewhat concerned about breaking a lot of user code, and also am not sure yet what the right solution here looks, because of concerns about breaking duck-type compatibility with dictionaries. Which of these should no longer include coordinates? Can we make things consistent without changing all of them?
We would also need to encourage users to use the low level |
From those, the latter three are trivial to keep. |
I'm pretty sure now that we should (at least) change For the most recent example, consider the API for The question is how to do it: is it worth a deprecation cycle? My proposal:
Note: |
Another question is what to do with If we were starting over, I might change how
Related: over in #1645, I am deprecating the current behavior of If we aren't going to fundamentally change the behavior of |
Do we need to change the behavior of |
No, I think these are guaranteed to be consistent because we inherit from |
So, the behavior of In #2162, you changed for k, v in dataset.items(): to dataset = OrderedDict(dataset)
for k, v in dataset.items(): to avoid the warning, but I am afraid it will cause the unexpected behavior |
This has been a small-but-persistent issue for me for a while. I suspect that my perspective might be dependent on my current outlook, but socializing it here to test if it's secular...
Currently
Dataset.keys()
returns both variables and coordinates (but not itsattrs
keys):Is this conceptually correct? I would posit that a Dataset is a mapping of keys to variables, and the coordinates contain values that label that data.
So should
Dataset.keys()
instead return just the keys of the Variables?We're often passing around a dataset as a
Mapping
of keys to values - but then when we run a function across each of the keys, we get something run on both the Variables' keys, and the Coordinate / label's keys.In Pandas,
DataFrame.keys()
returns just the columns, so that conforms to what we need. While I think the xarray design is in general much better in these areas, this is one area that pandas seems to get correct - and because of the inconsistency between pandas & xarray, we're having to coerce our objects to pandasDataFrame
s before passing them off to functions that pull out their keys (this is also why we can't just look atds.data_vars.keys()
- because it breaks that duck-typing).Does that make sense?
The text was updated successfully, but these errors were encountered: