You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
At the moment, we have source as dim, but only allow a single source in a dataset. That actually gives us the worst of all worlds: Only a single source, incompatibilities when doing arithmetic with different sources, and more dimensions, which always hurt somewhat.
We have three options what to do instead:
A single source, in attrs.
Advantages:
One less dimension.
Direct arithmetic with different sources.
Disadvantages:
Multiple sources have to be held in multiple datasets always. If working with many sources, for example when plotting differences between sources, for loops have to be used.
One or more sources, in dim
Advantages:
Select for source like for area.
Explicit arithmetic with different sources (e.g. da1.loc[{'source': 'FAO'}] + da2.loc[{'source': 'Andrew'}])
Multiple sources can be held in a single dataset. Working with many sources becomes fluent.
Disadvantages:
When working with a single source, the additional dimension makes representations larger, makes tabular display in pycharm more difficult, etc.
Explicit arithmetic with different sources (more to type in the "working with two different sources, I know what I'm doing" use case)
Hybrid, both allowed
Advantages:
When working with one dimension, all advantages of the attrs solution.
When working with multiple dimensions, all advantages of the dim solution.
Disadvantages:
Shared functions need to consider both cases.
Mixing Datasets using one style with Datasets using the other style leads to surprising results.
Explicit conversions necessary.
My gut reaction at the moment (especially now that we have da.pr.set() and therefore don't have to deal with da.loc[sel] = array[..., np.newaxis] anymore, the np.newaxis really pissed me off) would be to standardize on "one or more sources, in dim".
The text was updated successfully, but these errors were encountered:
I think one ore more sources in dim is best. I'm a bit worried about memory use though. But we'll see.
And for displaying of data a to_dataframe() or to_interchangeformat() function would be great so you get the table as in a csv file without all the nans
And for displaying of data a to_dataframe() or to_interchangeformat() function would be great so you get the table as in a csv file without all the nans
At the moment, we have
source
asdim
, but only allow a single source in a dataset. That actually gives us the worst of all worlds: Only a single source, incompatibilities when doing arithmetic with different sources, and more dimensions, which always hurt somewhat.We have three options what to do instead:
A single source, in
attrs
.Advantages:
Disadvantages:
One or more sources, in
dim
Advantages:
source
like forarea
.da1.loc[{'source': 'FAO'}] + da2.loc[{'source': 'Andrew'}]
)Disadvantages:
Hybrid, both allowed
Advantages:
attrs
solution.dim
solution.Disadvantages:
My gut reaction at the moment (especially now that we have
da.pr.set()
and therefore don't have to deal withda.loc[sel] = array[..., np.newaxis]
anymore, the np.newaxis really pissed me off) would be to standardize on "one or more sources, indim
".The text was updated successfully, but these errors were encountered: