-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature: multi-dimensional bins #28
Comments
I think this dask pR is related: dask/dask#7346 |
after quickly skimming over np.histogram2d and this issue, I think that I am asking here for a different thing: I what that |
I think this is a really nice contribution that would be appreciated by many users, though I'm not sure whether it's out of scope for xhistogram (thoughts @rabernat?). One simple option could be to allow |
@aaronspring I'm trying to make sure I understand what you're proposing here - is this an accurate summary of what you're suggesting?: If I have data which includes a time dimension, and I currently count along other dimension(s), my result will still have a time dimension, but the resultant bin coordinates are currently only allowed to be one-dimensional. You are proposing to be able to pass a multidimensional bins array (for each variable potentially for an N-D histogram) which has bin edges that are a function of time. The result would have the same shape array for the bin_counts, but the bin coordinates on the output would vary along this time dimension (passed straight through from your input). You would have ended up with a histogram whose counts and bin edges changed over time - effectively a set of separate histograms calculated independently for each point in time. If that's what you're suggesting then it actually sounds fairly doable - it's basically just allowing the bins arguments to be ND and then making sure they broadcast properly. You would also need an input check that your bins arrays don't vary along any of the dimensions you want to count over, because I think that would be nonsensical. |
@TomNicholas didnt see that linked PR. thanks for linking again here. simply put: I want to use nd instead of 1d arrays as bins in whats the kind of API/code example I was looking for: https://gist.github.com/aaronspring/251553f132202cc91aadde03f2a452f9 (dont focus on the results, just the dimensionality) |
Here's how to do it with flox: xarray-contrib/flox#203 [well hopefully I got it right ;)] |
Well this looks great!
I did have a little go in the airport after AMS but didn't get anywhere
near this far 😅
Next step would be to test it against histogram (for the cases xhistogram
already handles).
What's the plan with flox integration into Xarray? Will it always be
optional? Will it become part of main?
…On Wed, Jan 18, 2023, 12:01 AM Deepak Cherian ***@***.***> wrote:
Here's how to do it with flox: xarray-contrib/flox#203
<xarray-contrib/flox#203> [well hopefully I got
it right ;)]
Rendered version
<https://flox--203.org.readthedocs.build/en/203/user-stories/nD-bins.html>
—
Reply to this email directly, view it on GitHub
<#28 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AISNPI4JVVCPS5D2WPUWW5TWS52JNANCNFSM4Y7VCW6Q>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Apparently flox will stay optional, so if we moved this functionality into xarray it would rely on an optional import, but that's okay. |
@dcherian how might this work for N-dimensional histograms? I.e. placing N variables into N sets of bins. That's obviously one of the main features xhistogram provides. I notice your notebook says
Does that mean we would still have to do a reshape of some kind? |
https://flox.readthedocs.io/en/latest/intro.html#histogramming-binning-by-multiple-variables Does this make it clear |
Ooh! Checking I've understood this correctly: xarray_reduce(
da, # weights, here just ones
"labels1", # name of 1st variable we want to bin
"labels2", # name of 2nd variable we want to bin
func="count", # count occurrences falling in bins
expected_groups=(
pd.IntervalIndex.from_breaks([-0.5, 4.5, 6.5, 8.9]), # bins for 1st variable
pd.IntervalIndex.from_breaks([0.5, 1.5, 1.9]), # bins for 2nd variable
),
) |
Yes. PR to improve that page is very welcome! |
currently, xhistogram only allows bins to be one-dimensional.
however, when the bin edges vary in time (seasonality) or space (locations of the globe) xhistogram cannot be used with multi-dim bins. there is a hard-coded requirement for bins elements to be 1-d.
One of such multi-dim bin applications is the ranked probability score rps we use in xskillscore.rps, where we want to know how many forecasts fell into which bins. Bins are often defined as terciles of the forecast distribution and the bins for these terciles (forecast_with_lon_lat_time_dims.quantile(q=[.33,.66],dim='time')) depend on lon and lat.
How we solved this in xskillscore.rps:
<
gives us CDFs, anddiff
brings it back to histograms. maybe have to throw away the upper edgehttps://github.com/xarray-contrib/xskillscore/blob/493f9afd7b5acefb4baa47bec6ad65fca19965bd/xskillscore/core/probabilistic.py#L680
I first implemented rps with xhistogram, then with the snippet above, yields same results.
However, I am not sure whether such multi-dimensional bins would be an interesting addition to xhistogram or are out-of-scope.
The text was updated successfully, but these errors were encountered: