-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Series.isin fails (errors) for categoricals #16639
Comments
I'm guessing the fix to this looks something like #16543 - did some refactoring the algorithms file and this is a case that probably got missed |
this fixes. Though I think we should add some asv's with categoricals to make sure they are hitting the right path
|
@aviolov want to push a PR for the above fix? |
@jreback at the risk of sounding ignorant - how would I do that (maybe a link to some documentation / how-to)? |
@aviolov which part, specifically? All the contributing docs are at http://pandas.pydata.org/pandas-docs/stable/contributing.html. If you have any additional questions, just ask them here. |
@TomAugspurger , thanks for the link. I guess a 'PR' is a pull request in this case. Is the idea that I download version 0.20.3 and check that my minimal example above works now or that I branch the current version and implement the fix suggested above and then try to push it back or... ? I haven't made a branch off pandas before, but would be fun to try - the how-to looks quite comprehensive |
@aviolov you'll fork the repo as described in http://pandas.pydata.org/pandas-docs/stable/contributing.html#forking Then create a new branch Then apply your changes:
Then push and make a pull request (PR) |
@TomAugspurger cool, I'll give it a try |
I could not get
|
you don't need to squash |
Code Sample, a copy-pastable example if possible
Problem description
I get an error in 0.20.1
File "", line 12, in
select_ids = DFtrades['id'].isin(DFscores['id']);
File "C:\Users\alexandre\Anaconda3\lib\site-packages\pandas\core\series.py", line 2555, in isin
result = algorithms.isin(_values_from_object(self), values)
File "C:\Users\alexandre\Anaconda3\lib\site-packages\pandas\core\algorithms.py", line 421, in isin
return f(comps, values)
File "C:\Users\alexandre\Anaconda3\lib\site-packages\pandas\core\algorithms.py", line 399, in
f = lambda x, y: htable.ismember_object(x, values)
File "pandas_libs\hashtable_func_helper.pxi", line 428, in pandas._libs.hashtable.ismember_object (pandas_libs\hashtable.c:29677)
ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long long'
Expected Output
a boolean array (or series?) indicating the third row of DFtrades is not in DFscores but the other three are
for reference, this worked (I did not get an error) in 0.19.(something)
also this code will work as expected:
select_ids = DFtrades['id'].isin(DFscores['id'].values);
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en
LOCALE: None.None
pandas: 0.20.1
pytest: 3.1.1
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.11.3
scipy: 0.19.0
xarray: 0.9.5
IPython: 6.1.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.2.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.8.0
bs4: 4.6.0
html5lib: 0.999
sqlalchemy: 1.1.10
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: