Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: hash_pandas_object fails on array containing tuple #28969 #30508

Merged
merged 5 commits into from
Dec 31, 2019

Conversation

jbrockmendel
Copy link
Member

ATM this implements a smoke-test that the cases don't raise, but I don't have a canonical answer to test them against. Could hard-code the result I get now. Thoughts?

except ValueError:
# GH#28969 if we contain tuples, astype fails here,
# apparently related to github.com/numpy/numpy/issues/9441
str_vals = np.array([str(x) for x in vals], dtype=object)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm not sure about this - wouldn't this really make everything hashable at this point, including non-hashable items?

>>> vals = np.array([("a",), ["c", "d"]])
>>> str_vals = np.array([str(x) for x in vals], dtype=object)
array(["('a',)", "['c', 'd']"], dtype=object)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point

@@ -306,8 +307,15 @@ def hash_array(
vals = hashing.hash_object_array(vals, hash_key, encoding)
except TypeError:
# we have mixed types
try:
str_vals = vals.astype(str)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would handle this in hash_object_array as a case

@jreback jreback added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Dec 27, 2019
@jbrockmendel
Copy link
Member Author

moved the tuple-catching to hash_object_array, added check for non-hashable within a tuple

@jbrockmendel jbrockmendel marked this pull request as ready for review December 27, 2019 21:21
@jreback jreback added this to the 1.0 milestone Dec 30, 2019
@jreback
Copy link
Contributor

jreback commented Dec 30, 2019

also have 2 tests in the OP

pd.util.hash_pandas_object(pd.DataFrame({'data': [tuple('1'), tuple('2')]})) # fails
pd.util.hash_pandas_object(pd.DataFrame({'data': [tuple([1]), tuple([2])]})) # fails

@TomAugspurger
Copy link
Contributor

Also needs a whatsnew note.

@jbrockmendel
Copy link
Member Author

also have 2 tests in the OP

these are the df1 and df2 in the test that this adds.

Also needs a whatsnew note.

updated.

@TomAugspurger TomAugspurger merged commit f8a0989 into pandas-dev:master Dec 31, 2019
@TomAugspurger
Copy link
Contributor

Thanks!

@jbrockmendel jbrockmendel deleted the bug-hashtup branch December 31, 2019 16:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

hash_pandas_object fails on tuple
4 participants