-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: explicit indexes #2195
WIP: explicit indexes #2195
Conversation
xarray/core/coordinates.py
Outdated
@@ -314,6 +315,71 @@ def __unicode__(self): | |||
return formatting.indexes_repr(self) | |||
|
|||
|
|||
def normalize_indexes(indexes, coords, sizes): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(will need =None
)
xarray/core/coordinates.py
Outdated
indexes = {} if indexes is None else dict(indexes) | ||
|
||
# default indexes | ||
for key in sizes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to allow the sizes to be optional, and drawn from the indexes / coords? At the moment they must be supplied in sizes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was imagining calling this function after the Dataset/DataArray is filled in with fully specified xarray.Variable
objects, so we would already have determined sizes for all dimensions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect, that makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually (and I now realize from the other PR this is currently a sketch) then we likely want to check the sizes are consistent if they're also supplied in indexes
or coords
xarray/core/variable.py
Outdated
else: | ||
variable = Variable(dims, data, *args, fastpath=True) | ||
|
||
return OrderedDict([(name, variable)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
xarray/core/coordinates.py
Outdated
@@ -314,6 +315,71 @@ def __unicode__(self): | |||
return formatting.indexes_repr(self) | |||
|
|||
|
|||
def normalize_indexes(indexes, coords, sizes): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to follow this PR up, but not yet sure what this utility function does.
Can you suggest an example usage of this?
What I currently think of is
- Without MultiIndex
coords = OrderedDict()
coords['x'] = Variable(('x', ), [0, 1, 2])
coords['y'] = Variable(('y', ), ['a', 'b'])
sizes = {'x': 3, 'y': 2}
actual = normalize_indexes({}, coords, sizes)
expected = {'x': coords['x'].to_index(), 'y': coords['y'].to_index()}
- With MultiIndex
coords = OrderedDict()
coords['x'] = Variable(('z', ), [0, 1, 2])
coords['y'] = Variable(('z', ), ['a', 'b'])
sizes = {'x': 6}
actual = normalize_indexes({}, coords, sizes)
expected = {'x': pd.Index([0, 0, 1, 1, 2, 2]),
'y': pd.Index(['a', 'b', 'a', 'b', 'a', 'b'])}
Or do we need a MultiIndex
for 'z'
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your "without MultiIndex" example look correct, but I think your "with MultiIndex" case should raise an error, because the dimension "z" has inconsistent sizes.
I was thinking something more like:
indexes = {
'x': pandas.Index([0, 0, 1, 1, 2, 2]),
'y': pandas.Index(['a', 'b', 'a', 'b', 'a', 'b']),
}
coords = {
'x': Variable(('z',), [0, 0, 1, 1, 2, 2]),
'y': Variable(('z',), ['a', 'b', 'a', 'b', 'a', 'b']),
}
sizes = {'z': 6}
actual = normalize_indexes(indexes, coords, sizes)
combined_index = pandas.MultiIndex.from_arrays(
[indexes['x'], indexes['y']], names=['x', 'y'])
expected = {'x': combined_index, 'y': combined_index, 'z': combined_index}
The resulting object would supporting indexing with any of .sel(x=...)
, .sel(y=...)
or .sel(z=...)
, all based on combined_index
.
Hello @shoyer! Thanks for updating the PR.
|
I've pushed a few commits with some additional progress here. Here's my current plan:
|
I guess we can close this one now that #5692 is merged? |
Some utility functions that should be useful for #1603
Still very much a work in progress -- it would be great if someone has time to finish writing any of these in another PR!