Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Upstream Improvement] TrackedArray into scipy/numpy? #283

Closed
Marviel opened this issue Jan 3, 2019 · 3 comments
Closed

[Upstream Improvement] TrackedArray into scipy/numpy? #283

Marviel opened this issue Jan 3, 2019 · 3 comments

Comments

@Marviel
Copy link
Contributor

Marviel commented Jan 3, 2019

(A bit of a weird issue, since it's more about the upstream repo than anything, also spoken from a place of fair ignorance about the subject at hand. Feel free to close :))

It occurred to me today that the TrackedArray data structure could have uses in many different applications across the numpy/scipy ecosystem. If this is the case, would there be interest in trying to get that code into numpy/scipy? If so, what barriers to entry would exist for doing so?

@mikedh
Copy link
Owner

mikedh commented Jan 3, 2019

That would be cool! They have a nice development guide, since this involves an API change it would probably involve a discussion on numpy-discussion.

It seems like a discussion is likely to yield a bunch of edge cases where the hashes misrepresent the array haha. I'd probably pitch the API change as a single function, something like:

a = np.random.random(10)
hash = a.crc()
a[0] *= 2.0
assert hash != a.crc()

@mikedh
Copy link
Owner

mikedh commented Jan 8, 2019

Some quick timing examples of how a hash check compares to various operations:

In [1]: import numpy as np

In [2]: from zlib import adler32

In [3]: from xxhash import xxh64_intdigest

In [6]: a = np.random.random((10000, 3))

In [7]: %timeit xxh64_intdigest(a.tobytes())
34.4 µs ± 346 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [10]: %timeit adler32(a.tobytes())
118 µs ± 1.34 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [12]: %timeit np.dot(a, a.T)
2.05 s ± 33.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [13]: %timeit np.dot(a, [1,2,3])
22.8 µs ± 565 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [14]: %timeit np.cross(a, a)
240 µs ± 4.36 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

@mikedh
Copy link
Owner

mikedh commented Aug 13, 2019

Added to enhancement list, closing for now.

@mikedh mikedh closed this as completed Aug 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants