-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alternative dask-powered histogram algorithm using xarray.groupby and numpy_groupies #60
Comments
Thanks @TomNicholas Well this looks familiar :). If you start solving all the questions in your notebook, you'll probably end up with something like https://github.com/dcherian/dask_groupby/blob/ec6f13400ab8ccc9269099076a31b44354e8ecf6/dask_groupby/core.py#L167 In any case it's hard to do better (in terms of complexity) than What you could do in ( I spent a lot of time thinking about this, so we can chat some time if you want ) |
Wow a lot of work has gone into that @dcherian !
I think you could do this, but the effect would be similar to Ryan's
Can't we just use |
no that's potentially so expensive it takes down the cluster. It's better to have an algorithm that knows how to deal with the chunks like #49 |
After @shoyer mentioned earlier today that he had an example of dealing with with the ND-histogram problem in xarray by using
xarray.apply_ufunc
and numpy_groupies, I made this notebook to try it out for creating histograms in xarray.The basic idea is that
da.groupby_bins(bins).apply(count)
essentially creates a histogram, andnumpy_groupies
can speed up thegroupby_bins
hugely.I think its pretty cool that it even works, but you'll see in the notebook that I don't think the performance compares favourably with xhistogram's
dask.blockwise
implementation (see #49), though I didn't manage to get numba-powered groupies working yet. The dask task graphs are also not as nice.@rabernat this is the sort of thing I had in mind originally.
@gjoseph92 you might find this interesting as an alternate solution to your
blockwise
one.@dcherian and @max-sixty you might find this example interesting as I know you've been working on using
numpy_groupies
in pydata/xarray#4473 .The text was updated successfully, but these errors were encountered: