-
Notifications
You must be signed in to change notification settings - Fork 476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dask.base.tokenize() fails when one of the arguments is a Quantity #1313
Comments
that indeed seems like a bug, the code currently assumes that if you're calling Although it doesn't change much in this case, please note that the most recent version of |
Thanks for raising this issue! This brings up an interesting point: is @mnlevy1981 as far as your point on "not using |
This error does still appear in the latest version, but I had a notebook that used to work so I just backed up to like 0.10 and then stepped forward until I found the last working version. I haven't actually run it in 0.16, but it's definitely broken in 0.15 and 0.17
It's been a while since I wrote the code, but I believe I have a dictionary of various unit conversion factors that are all [scalar] Pint Quantities and then I'm multiplying several |
well, if As for the unique identifier: if it's not too expensive should we just delegate to the magnitude (i.e. Edit: that way, |
Once you do, I'd love to see how you use it and where the pain points currently are (if any). |
I see you already opened a PR @keewis, thanks. I'll still chime in briefly.
Would a particularly large, say 5 GB or larger, numpy array be too expensive? I imagine tokenizing doesn't read the whole array, just its memory location, but I'm not completely familiar with the process. |
If I read the code for In [2]: from dask.base import tokenize
...: import numpy as np
...:
...: for size in (100, 10000, 100000000):
...: a = np.linspace(0, 1, size)
...: print("size:", a.nbytes)
...: %timeit tokenize(a)
...:
size: 800
23.8 µs ± 101 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
size: 80000
34.4 µs ± 193 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
size: 800000000
105 ms ± 8.72 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) |
Something changed between 0.14 and 0.15 (#1151 seems like a likely culprit based on the issue it fixed), and now passing
Pint.Quantity
variables todask.base.tokenize()
fails.Here's a simple script:
It runs fine with 0.14:
but not 0.15:
It looks like #1151 introduced a
test_dask_tokenize()
test that is presumably passing, so there's a pretty good chance that I'm just not usingpint
correctly in adask
setting :)The text was updated successfully, but these errors were encountered: