Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Value error on groupby bfill() #9970

Closed
arijun opened this issue Apr 22, 2015 · 7 comments
Closed

Value error on groupby bfill() #9970

arijun opened this issue Apr 22, 2015 · 7 comments
Labels
Bug Duplicate Report Duplicate issue or pull request Groupby Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

Comments

@arijun
Copy link

arijun commented Apr 22, 2015

s = pd.Series([1,1,0], index=[1,2,2])
s.groupby(s).fillna(method='bfill')

I have a dataframe with a column containing some strings and NaNs. I can do do df.col.bfill() but when I try to do df.groupby('col2').col1.bfill() I get a ValueError.
I can do first and apply(lambda x: x is str) without error.

Here is the full traceback:

Traceback (most recent call last):
  File "C:\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 558, in wrapper
    return self.apply(curried_with_axis)
  File "C:\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 662, in apply
    return self._python_apply_general(f)
  File "C:\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 669, in _python_apply_general
    not_indexed_same=mutated)
  File "C:\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 2380, in _wrap_applied_output
    not_indexed_same=not_indexed_same)
  File "C:\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 1142, in _concat_objects
    result = result.reindex(ax)
  File "C:\Anaconda3\lib\site-packages\pandas\core\series.py", line 2149, in reindex
    return super(Series, self).reindex(index=index, **kwargs)
  File "C:\Anaconda3\lib\site-packages\pandas\core\generic.py", line 1731, in reindex
    method, fill_value, copy).__finalize__(self)
  File "C:\Anaconda3\lib\site-packages\pandas\core\generic.py", line 1749, in _reindex_axes
    allow_dups=False)
  File "C:\Anaconda3\lib\site-packages\pandas\core\generic.py", line 1834, in _reindex_with_indexers
    copy=copy)
  File "C:\Anaconda3\lib\site-packages\pandas\core\internals.py", line 3150, in reindex_indexer
    raise ValueError("cannot reindex from a duplicate axis")
ValueError: cannot reindex from a duplicate axis

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 561, in wrapper
    return self.apply(curried)
  File "C:\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 662, in apply
    return self._python_apply_general(f)
  File "C:\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 669, in _python_apply_general
    not_indexed_same=mutated)
  File "C:\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 2380, in _wrap_applied_output
    not_indexed_same=not_indexed_same)
  File "C:\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 1142, in _concat_objects
    result = result.reindex(ax)
  File "C:\Anaconda3\lib\site-packages\pandas\core\series.py", line 2149, in reindex
    return super(Series, self).reindex(index=index, **kwargs)
  File "C:\Anaconda3\lib\site-packages\pandas\core\generic.py", line 1731, in reindex
    method, fill_value, copy).__finalize__(self)
  File "C:\Anaconda3\lib\site-packages\pandas\core\generic.py", line 1749, in _reindex_axes
    allow_dups=False)
  File "C:\Anaconda3\lib\site-packages\pandas\core\generic.py", line 1834, in _reindex_with_indexers
    copy=copy)
  File "C:\Anaconda3\lib\site-packages\pandas\core\internals.py", line 3150, in reindex_indexer
    raise ValueError("cannot reindex from a duplicate axis")
ValueError: cannot reindex from a duplicate axis

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 570, in wrapper
    return self._aggregate_item_by_item(name, *args, **kwargs)
  File "C:\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 511, in __getattr__
    (type(self).__name__, attr))
AttributeError: 'SeriesGroupBy' object has no attribute '_aggregate_item_by_item'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Program Files (x86)\JetBrains\PyCharm 3.4.1\helpers\pydev\pydevd.py", line 1733, in <module>
    debugger.run(setup['file'], None, None)
  File "C:\Program Files (x86)\JetBrains\PyCharm 3.4.1\helpers\pydev\pydevd.py", line 1226, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "C:\Program Files (x86)\JetBrains\PyCharm 3.4.1\helpers\pydev\_pydev_execfile.py", line 38, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc) #execute the script
  File "G:/Users/ari/Documents/Git/Misc/testbross.py", line 5, in <module>
    brwbt = KevinsDB.KevinToWBT(brdb.both[brdb.both.SYMBOL=='VCLT'])
  File "G:\Users\ari\Documents\Git\KevinsDB.py", line 95, in KevinToWBT
    s = grp.OrderTime.ffill().bfill()
  File "C:\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 572, in wrapper
    raise ValueError
ValueError
@jreback
Copy link
Contributor

jreback commented Apr 22, 2015

pls supply an example which reproduces this or
show df.info() and df.head()
as well as pd.show_versions()

@jreback
Copy link
Contributor

jreback commented Apr 30, 2015

can you show a reproducible example and pd.show_versions()

@jreback jreback added Groupby Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Apr 30, 2015
@arijun
Copy link
Author

arijun commented Apr 30, 2015

To get a reproducible example I tried manually grouping:

for g in df.groupingColumn.unique():
    group = df[df.groupingColumn == g]
    group.groupby('groupingColumn').otherColumn.bfill()

but it didn't cause an exception when I did it like that.

I also tried to split it into smaller chunks to cut down the size but couldn't cut it much more than half the original size.

If you can think of another method to get a minimal reproducing DataFrame please let me know.

The output of pd.show_versions() was

INSTALLED VERSIONS
------------------
commit: None
python: 3.4.1.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 26 Stepping 5, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.16.0
nose: 1.3.3
Cython: 0.20.1
numpy: 1.9.2
scipy: 0.15.1
statsmodels: None
IPython: 2.1.0
sphinx: 1.2.2
patsy: 0.2.1
dateutil: 2.4.2
pytz: 2015.2
bottleneck: None
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.3.1
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: 0.8.0
xlsxwriter: 0.5.5
lxml: 3.3.5
bs4: 4.3.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 0.9.7
pymysql: 0.6.2.None
psycopg2: None

@jreback
Copy link
Contributor

jreback commented Apr 30, 2015

@arijun you need to provide a constructed dataframe and your code that repros e.g.

df = DataFrame(.......)
df.groupby(...)......

@jorisvandenbossche
Copy link
Member

@arijun How big is your original dataframe? And may you share it?

@Marigold
Copy link

This produces @arijun error (using 0.16.2)

s = pd.Series([1,1,0], index=[1,2,2])
s.groupby(s).fillna(method='bfill')

It fails on calling reindex on series with duplicate index. I'm not that familiar with pandas codebase to submit a PR for this, but let me know if I can help in any other way.

@jreback jreback added the Bug label Sep 14, 2015
@jreback jreback added this to the Next Major Release milestone Sep 14, 2015
@TomAugspurger TomAugspurger added the Duplicate Report Duplicate issue or pull request label Jan 31, 2018
@TomAugspurger TomAugspurger modified the milestones: Next Major Release, No action Jan 31, 2018
@TomAugspurger
Copy link
Contributor

Duplicate of #19437

@TomAugspurger TomAugspurger marked this as a duplicate of #19437 Jan 31, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Duplicate Report Duplicate issue or pull request Groupby Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

No branches or pull requests

5 participants