Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: ValueError with Series.isin and tuples #16394

Closed
wmp3 opened this issue May 20, 2017 · 3 comments · Fixed by #16434
Closed

BUG: ValueError with Series.isin and tuples #16394

wmp3 opened this issue May 20, 2017 · 3 comments · Fixed by #16434
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@wmp3
Copy link

wmp3 commented May 20, 2017

Code Sample, a copy-pastable example if possible

import pandas as pd

df = pd.DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'f']})
df['C'] = list(zip(df['A'], df['B']))
df['C'].isin([(1, 'a')])

Problem description

Returns ValueError:
Traceback (most recent call last):
File "", line 1, in
File "/anaconda/envs/pandas_dev/lib/python3.6/site-packages/pandas/core/series.py", line 2555, in isin
result = algorithms.isin(_values_from_object(self), values)
File "/anaconda/envs/pandas_dev/lib/python3.6/site-packages/pandas/core/algorithms.py", line 421, in isin
return f(comps, values)
File "/anaconda/envs/pandas_dev/lib/python3.6/site-packages/pandas/core/algorithms.py", line 399, in
f = lambda x, y: htable.ismember_object(x, values)
File "pandas/_libs/hashtable_func_helper.pxi", line 428, in pandas._libs.hashtable.ismember_object (pandas/_libs/hashtable.c:29677)
ValueError: Buffer has wrong number of dimensions (expected 1, got 2)

Expected Output

In pandas 0.19.2 returns:
0 True
1 False
2 False
Name: C, dtype: bool

Output of pd.show_versions()

# Paste the output here pd.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 3.6.1.final.0 python-bits: 64 OS: Darwin OS-release: 16.5.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.20.0rc2
pytest: None
pip: 9.0.1
setuptools: 27.2.0
Cython: None
numpy: 1.12.1
scipy: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
pandas_gbq: None
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented May 20, 2017

this code was refactored to be more general, so this was a missing case. easy fix I think. np.array converts nested tuples to lists, which is not nice, so do this.

if you'd like to submit a PR with this as an added tests (and make sure nothing else breaks), would be great.

diff --git a/pandas/core/algorithms.py b/pandas/core/algorithms.py
index a745ec6..77d79c9 100644
--- a/pandas/core/algorithms.py
+++ b/pandas/core/algorithms.py
@@ -388,7 +388,7 @@ def isin(comps, values):
                         "[{0}]".format(type(values).__name__))
 
     if not isinstance(values, (ABCIndex, ABCSeries, np.ndarray)):
-        values = np.array(list(values), dtype='object')
+        values = lib.list_to_object_array(list(values))
 
     comps, dtype, _ = _ensure_data(comps)
     values, _, _ = _ensure_data(values, dtype=dtype)

@jreback jreback added Bug Difficulty Novice Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels May 20, 2017
@jreback jreback added this to the Next Major Release milestone May 20, 2017
@jreback jreback changed the title ValueError with Series.isin and tuples BUG: ValueError with Series.isin and tuples May 20, 2017
@jorisvandenbossche jorisvandenbossche modified the milestones: 0.20.2, Next Major Release May 20, 2017
@jaredsnyder
Copy link
Contributor

I'm taking a crack at this. Is the solution to just add lib.list_to_object_array back in along with a test for the tuple case, or should we check if comps contains tuples and use lib.list_to_object_array only if it does?

@jorisvandenbossche
Copy link
Member

@jaredsnyder I think you can try the exact change that @jreback showed above, when it are not tuples, both approaches should normally do the same, so I don't think it is needed to check if it contains tuples or not. And for sure adding a test!

jorisvandenbossche pushed a commit that referenced this issue May 23, 2017
* Swiched out "values = np.array(list(values), dtype='object')" for "values = lib.list_to_object_array(list(values))" in the isin() method found in core/algorithms.py
Added test for comparing to a list of tuples
TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this issue May 29, 2017
…-dev#16434)

* Swiched out "values = np.array(list(values), dtype='object')" for "values = lib.list_to_object_array(list(values))" in the isin() method found in core/algorithms.py
Added test for comparing to a list of tuples

(cherry picked from commit e053ee3)
TomAugspurger pushed a commit that referenced this issue May 30, 2017
* Swiched out "values = np.array(list(values), dtype='object')" for "values = lib.list_to_object_array(list(values))" in the isin() method found in core/algorithms.py
Added test for comparing to a list of tuples

(cherry picked from commit e053ee3)
stangirala pushed a commit to stangirala/pandas that referenced this issue Jun 11, 2017
…-dev#16434)

* Swiched out "values = np.array(list(values), dtype='object')" for "values = lib.list_to_object_array(list(values))" in the isin() method found in core/algorithms.py
Added test for comparing to a list of tuples
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants