-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: hash_pandas_object fails on array containing tuple #28969 #30508
Conversation
pandas/core/util/hashing.py
Outdated
except ValueError: | ||
# GH#28969 if we contain tuples, astype fails here, | ||
# apparently related to github.com/numpy/numpy/issues/9441 | ||
str_vals = np.array([str(x) for x in vals], dtype=object) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm not sure about this - wouldn't this really make everything hashable at this point, including non-hashable items?
>>> vals = np.array([("a",), ["c", "d"]])
>>> str_vals = np.array([str(x) for x in vals], dtype=object)
array(["('a',)", "['c', 'd']"], dtype=object)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point
pandas/core/util/hashing.py
Outdated
@@ -306,8 +307,15 @@ def hash_array( | |||
vals = hashing.hash_object_array(vals, hash_key, encoding) | |||
except TypeError: | |||
# we have mixed types | |||
try: | |||
str_vals = vals.astype(str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would handle this in hash_object_array as a case
moved the tuple-catching to hash_object_array, added check for non-hashable within a tuple |
also have 2 tests in the OP
|
Also needs a whatsnew note. |
these are the df1 and df2 in the test that this adds.
updated. |
Thanks! |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
ATM this implements a smoke-test that the cases don't raise, but I don't have a canonical answer to test them against. Could hard-code the result I get now. Thoughts?