-
Notifications
You must be signed in to change notification settings - Fork 477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Dask collection interface #1129
Conversation
Add `__dask_*__` methods to the Quantity class. Add convenience functions `compute()`, `persist()`, and `visualize()`. Wrap these convenience functions to raise an AttributeError if the wrapped object is not a Dask Array.
Update downcast tests to include Dask Array. Add testing module for Dask collection interface and convenience methods.
|
I'll take a look and see if I can setup something similar. Thanks! |
Change these methods to `@property`s and have them redirect to the appropriate Dask Array methods. Tests are included. Add a test utility function for incrementing test quantities.
I made all of the suggested changes and added tests for visualize and explicitly checking that |
This is looking really good, please feel free to ping me when you want me to review it and merge it. I would like to take the opportunity to ask a question: When are we going to provide an external package (e.g. |
My concern with extension packages is that they won't necessarily receive the same level of maintenance as the core pint library. As a maintainer of a downstream library, I'm always concerned about adding more dependencies. What do people see as the benefits of having external packages? |
@rpmanser As far as implementing the collection interface and testing it, this looks great! One small suggestion I would have is referring to dask/dask#4583 in the NumPy doc example, since it is Dask holding us back from the usual "multiply on the right" construction. Also, I think this gets enough of #883 taken care of to consider it closed, but I think a follow-up issue should be raised as a reminder to finish up tests once dask/dask#4583 is resolved (masked arrays and numpy/numpy#15200 are in the same position). One bigger-picture question/discussion point: in pydata/xarray#525 and #883 (and in other discussions), I was thinking of Pint Quantities "acting like a Dask Array" or "being a duck Dask Array" would occur if and only if the Quantity is wrapping a Dask Array. That is, a library like xarray could recognize a suitable Dask Array-like by a test like So, with that in mind, is it okay for xarray (and any other libraries depending on the duck Dask Array interfacing of Quantity) to have Quantity always be a duck Dask Array, or do these interface methods need to be "hidden" and only available when the magnitude of a Quantity is a Dask Array? While there ostensibly is another option (a "DaskQuantity" subclass that has these methods available), I think that would make the situation worse, since too much magic/special casing may be required to get a DaskQuantity as the result of arbitrary operations between Dask Arrays and regular Quantities. |
This is ready for your review @hgrecco. Looks like the travis CI has been failing due to the |
@jthielen In the absence of documentation on some kind of protocol, this might be one of those things that has to be worked out in practice. Having said that, personally I don't consider |
Sounds good. I've also asked over in xarray (pydata/xarray#4208), so we'll see if we get any feedback there. |
bors r+ |
Congratulations @rpmanser for the awesome work! |
Build succeeded: |
Thanks, happy I could help! |
This pull request implements the Dask collection interface for the Quantity class and adds convenience methods
compute()
,persist()
, andvisualize()
. As is, only the convenience methods are wrapped to check that the magnitude of the Quantity is a dask array, which should cover current use cases. Ping @jthielen since we've been discussing this.I have not tested
persist()
on a HPC cluster, which will have different behavior than when it is called on a single machine (e.g., desktop). Is there a way to integrate tests for distributed cases? If not, I can test this independently if needed.black -t py36 . && isort -rc . && flake8
with no errors