Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

missing_dims option for aggregation methods like mean and std #5030

Open
jbusecke opened this issue Mar 12, 2021 · 5 comments
Open

missing_dims option for aggregation methods like mean and std #5030

jbusecke opened this issue Mar 12, 2021 · 5 comments

Comments

@jbusecke
Copy link
Contributor

I work a lot with climate model output and often loop over several models, of which some have a 'member' dimension and others don't.

I end up writing many lines like this:

for ds in model_datasets:
    if 'member_id' in ds.dims:
        ds = ds.mean('member_id)

Which often makes for very lengthy code blocks.

I recently noticed that .isel() actually has a nifty keyword argument 'missing_dims', which enables the user to apply isel and it just doesn't do anything when the dimension is not present.

I'd love to be able to do:

for ds in model_datasets:
    ds = ds.mean('member_id', missing_dims='ignore')

Is there a way to implement this generally for xarray aggregation methods (mean/max/min/std/...). Or is there a reason this should be avoided?

@max-sixty
Copy link
Collaborator

That seems like a reasonable suggestion @jbusecke .

To confirm, would ds.groupby('lat', 'long').mean(...) work? i.e. are the dimensions you don't want to reduce over reliable?

@dcherian
Copy link
Contributor

Alternatively, you could run the following at the beginning

# not sure if syntax is right
model_datasets = [
    ds.expand_dims('member_id') 
    if "member_id" not in ds.coords else ds 
    for ds in model_datasets
]

so all your datasets are consistent.

@TomNicholas
Copy link
Member

I ran into the same sort of thing today, when trying to loop over many datasets (each of which contained the contents of a node in a datatree...).

I also think that adding a missing_dims argument to all the array reduce methods would be useful, and I plan to have a go at it.

@dcherian
Copy link
Contributor

dcherian commented Mar 3, 2022

My concern is that we could conceivably adding missing_dims to any function that takes a dim argument, which is pretty much the whole API.

For datatree, you could apply the reduction with the set-intersection of provided dims and dims present in a node (if that's the right term).

@TomNicholas
Copy link
Member

TomNicholas commented Mar 3, 2022

For datatree, you could apply the reduction with the set-intersection of provided dims and dims present in a node (if that's the right term).

I specifically want the user to be able to choose between different behaviours with a flag, but you're right that I could just deal with this at the datatree level instead of here. That would make a fair amount of sense, and it would cover Julius' use-case (via encouraging him to store his models in a tree, so that for ds in model_datasets would become a loop over nodes in a tree).

My concern is that we could conceivably adding missing_dims to any function that takes a dim argument, which is pretty much the whole API.

Do you think that's a problem though? We added keep_attrs to even more of the API than this would cover. Specifically I would want to add it to the REDUCE_METHODS, the NAN_REDUCE_METHODS, and the NAN_CUM_METHODS (so {"all", "any", "max", "min", "mean", "prod", "sum", "std", "var", "median", "cumsum", "cumprod"}).

I'm fine with doing it either here or in datatree personally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants