-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement hash for indices #3885
Comments
i've got something going here passing index tests but not multiindex. think there's an issue with tuple array copying possibly. although that doesn't give regular indexes with tuples a problem. might need to hash levels and labels as well for multiindexes. |
cc #3884. |
@cpcloud What's the goal here? - Under the hood, when doing things like key lookups, Python first uses def test_copies_have_same_hash():
df = mkdf(...)
ind = df.index
ind2 = df.copy().index
s = set([ind])
assert ind in s
assert ind2 in s def test_different_indices_can_not_look_each_other_up():
df = mkdf(...)
df2 = mkdf(...) # different length or something
ind = df.index
ind2 = df2.index
d = {ind: 5}
assert ind in s
assert ind2 not in s
assert d[ind] == 5 Then you'd also have to change def __eq__(self, other):
if issubclass(other, Index):
return (self.values == other.values).all()
else:
# either return False or do something else... |
ah there's a problem because |
nice catch |
If there is a need for this, you could create a 'HashableIndex' that has a
|
hm that is not a bad idea...Pul Rekwest™! |
Is your hashing scheme working right now? Otherwise could just default to On Fri, Jun 14, 2013 at 7:08 PM, Phillip Cloud [email protected]:
|
the hashing scheme is not working at the moment for multiindexes, and i haven't incorporated ur tests above with it. currently using sha1 hash of view of array as bytes plus xor-ing that with the sha1 of the index name, dtype, inferred_type, and class name. probably dtype and inferred type might be redundant but dtype hash gives u endian info so prolly should keep it if going to use this scheme. tbh it sounds like u know might know more about this than i do so feel free to take it... |
defaulting to repr will be slow for large indices since indexes repr valid python code iirc |
I don't know a ton, but here's my limited understanding: hashes are sorted On Fri, Jun 14, 2013 at 7:51 PM, Phillip Cloud [email protected]:
|
sure yes. basic properties of hashing. i thought u might be a hashing wizard ;) magic constants always make me die a little inside... |
what's the reason we even want hashability? |
that really is the question. |
consistency with the idea that indexes are immutable is the only thing that comes to mind so far from me...but i could be thinking small here. original question was about memoization of a frame, but that's really a non-starter anyway since those will never be hashable in the python sense |
@cpcloud yeah, I'm definitely no hashing master or whatever. :P anyways, you can see jtratner/pandas@880a8ea for a framework for a kind of hashable index _class_... Personally, I don't think it really gets you anything special. (also, I might have used too much magic there...not sure) |
related #2461 |
@jtratner I believe with your new identiy fixes we can close this? |
it's not quite the same thing (for example, if you pickled and unpickled an Index, the |
ok....close ? or explicty define |
|
would be nice as @jreback says to implement this since indices are supposed to be immutable. currently they try to hash the underlying
ndarray
which of course fails because it is mutable.succinct reasoning behind why you need immutability for hashables
python docs quote about using
^
(exclusive or) in the implementationbut see this answer for a way to hash numpy arrays.
The text was updated successfully, but these errors were encountered: