`missing_dims` option for aggregation methods like `mean` and `std` #5030

jbusecke · 2021-03-12T23:12:08Z

I work a lot with climate model output and often loop over several models, of which some have a 'member' dimension and others don't.

I end up writing many lines like this:

for ds in model_datasets:
    if 'member_id' in ds.dims:
        ds = ds.mean('member_id)

Which often makes for very lengthy code blocks.

I recently noticed that .isel() actually has a nifty keyword argument 'missing_dims', which enables the user to apply isel and it just doesn't do anything when the dimension is not present.

I'd love to be able to do:

for ds in model_datasets:
    ds = ds.mean('member_id', missing_dims='ignore')

Is there a way to implement this generally for xarray aggregation methods (mean/max/min/std/...). Or is there a reason this should be avoided?

The text was updated successfully, but these errors were encountered:

max-sixty · 2021-03-13T01:30:36Z

That seems like a reasonable suggestion @jbusecke .

To confirm, would ds.groupby('lat', 'long').mean(...) work? i.e. are the dimensions you don't want to reduce over reliable?

dcherian · 2021-03-13T02:23:38Z

Alternatively, you could run the following at the beginning

# not sure if syntax is right
model_datasets = [
    ds.expand_dims('member_id') 
    if "member_id" not in ds.coords else ds 
    for ds in model_datasets
]

so all your datasets are consistent.

TomNicholas · 2022-03-03T20:19:39Z

I ran into the same sort of thing today, when trying to loop over many datasets (each of which contained the contents of a node in a datatree...).

I also think that adding a missing_dims argument to all the array reduce methods would be useful, and I plan to have a go at it.

dcherian · 2022-03-03T21:56:55Z

My concern is that we could conceivably adding missing_dims to any function that takes a dim argument, which is pretty much the whole API.

For datatree, you could apply the reduction with the set-intersection of provided dims and dims present in a node (if that's the right term).

TomNicholas · 2022-03-03T22:35:08Z

For datatree, you could apply the reduction with the set-intersection of provided dims and dims present in a node (if that's the right term).

I specifically want the user to be able to choose between different behaviours with a flag, but you're right that I could just deal with this at the datatree level instead of here. That would make a fair amount of sense, and it would cover Julius' use-case (via encouraging him to store his models in a tree, so that for ds in model_datasets would become a loop over nodes in a tree).

My concern is that we could conceivably adding missing_dims to any function that takes a dim argument, which is pretty much the whole API.

Do you think that's a problem though? We added keep_attrs to even more of the API than this would cover. Specifically I would want to add it to the REDUCE_METHODS, the NAN_REDUCE_METHODS, and the NAN_CUM_METHODS (so {"all", "any", "max", "min", "mean", "prod", "sum", "std", "var", "median", "cumsum", "cumprod"}).

I'm fine with doing it either here or in datatree personally.

TomNicholas added the enhancement label Mar 3, 2022

TomNicholas mentioned this issue Mar 3, 2022

Ignore missing dims when mapping over tree xarray-contrib/datatree#67

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`missing_dims` option for aggregation methods like `mean` and `std` #5030

`missing_dims` option for aggregation methods like `mean` and `std` #5030

jbusecke commented Mar 12, 2021

max-sixty commented Mar 13, 2021

dcherian commented Mar 13, 2021

TomNicholas commented Mar 3, 2022

dcherian commented Mar 3, 2022 •

edited

Loading

TomNicholas commented Mar 3, 2022 •

edited

Loading

missing_dims option for aggregation methods like mean and std #5030

missing_dims option for aggregation methods like mean and std #5030

Comments

jbusecke commented Mar 12, 2021

max-sixty commented Mar 13, 2021

dcherian commented Mar 13, 2021

TomNicholas commented Mar 3, 2022

dcherian commented Mar 3, 2022 • edited Loading

TomNicholas commented Mar 3, 2022 • edited Loading

`missing_dims` option for aggregation methods like `mean` and `std` #5030

`missing_dims` option for aggregation methods like `mean` and `std` #5030

dcherian commented Mar 3, 2022 •

edited

Loading

TomNicholas commented Mar 3, 2022 •

edited

Loading