Value error on groupby bfill() #9970

arijun · 2015-04-22T19:47:10Z

s = pd.Series([1,1,0], index=[1,2,2])
s.groupby(s).fillna(method='bfill')

I have a dataframe with a column containing some strings and NaNs. I can do do df.col.bfill() but when I try to do df.groupby('col2').col1.bfill() I get a ValueError.
I can do first and apply(lambda x: x is str) without error.

Here is the full traceback:

Traceback (most recent call last):
  File "C:\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 558, in wrapper
    return self.apply(curried_with_axis)
  File "C:\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 662, in apply
    return self._python_apply_general(f)
  File "C:\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 669, in _python_apply_general
    not_indexed_same=mutated)
  File "C:\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 2380, in _wrap_applied_output
    not_indexed_same=not_indexed_same)
  File "C:\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 1142, in _concat_objects
    result = result.reindex(ax)
  File "C:\Anaconda3\lib\site-packages\pandas\core\series.py", line 2149, in reindex
    return super(Series, self).reindex(index=index, **kwargs)
  File "C:\Anaconda3\lib\site-packages\pandas\core\generic.py", line 1731, in reindex
    method, fill_value, copy).__finalize__(self)
  File "C:\Anaconda3\lib\site-packages\pandas\core\generic.py", line 1749, in _reindex_axes
    allow_dups=False)
  File "C:\Anaconda3\lib\site-packages\pandas\core\generic.py", line 1834, in _reindex_with_indexers
    copy=copy)
  File "C:\Anaconda3\lib\site-packages\pandas\core\internals.py", line 3150, in reindex_indexer
    raise ValueError("cannot reindex from a duplicate axis")
ValueError: cannot reindex from a duplicate axis

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 561, in wrapper
    return self.apply(curried)
  File "C:\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 662, in apply
    return self._python_apply_general(f)
  File "C:\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 669, in _python_apply_general
    not_indexed_same=mutated)
  File "C:\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 2380, in _wrap_applied_output
    not_indexed_same=not_indexed_same)
  File "C:\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 1142, in _concat_objects
    result = result.reindex(ax)
  File "C:\Anaconda3\lib\site-packages\pandas\core\series.py", line 2149, in reindex
    return super(Series, self).reindex(index=index, **kwargs)
  File "C:\Anaconda3\lib\site-packages\pandas\core\generic.py", line 1731, in reindex
    method, fill_value, copy).__finalize__(self)
  File "C:\Anaconda3\lib\site-packages\pandas\core\generic.py", line 1749, in _reindex_axes
    allow_dups=False)
  File "C:\Anaconda3\lib\site-packages\pandas\core\generic.py", line 1834, in _reindex_with_indexers
    copy=copy)
  File "C:\Anaconda3\lib\site-packages\pandas\core\internals.py", line 3150, in reindex_indexer
    raise ValueError("cannot reindex from a duplicate axis")
ValueError: cannot reindex from a duplicate axis

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 570, in wrapper
    return self._aggregate_item_by_item(name, *args, **kwargs)
  File "C:\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 511, in __getattr__
    (type(self).__name__, attr))
AttributeError: 'SeriesGroupBy' object has no attribute '_aggregate_item_by_item'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Program Files (x86)\JetBrains\PyCharm 3.4.1\helpers\pydev\pydevd.py", line 1733, in <module>
    debugger.run(setup['file'], None, None)
  File "C:\Program Files (x86)\JetBrains\PyCharm 3.4.1\helpers\pydev\pydevd.py", line 1226, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "C:\Program Files (x86)\JetBrains\PyCharm 3.4.1\helpers\pydev\_pydev_execfile.py", line 38, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc) #execute the script
  File "G:/Users/ari/Documents/Git/Misc/testbross.py", line 5, in <module>
    brwbt = KevinsDB.KevinToWBT(brdb.both[brdb.both.SYMBOL=='VCLT'])
  File "G:\Users\ari\Documents\Git\KevinsDB.py", line 95, in KevinToWBT
    s = grp.OrderTime.ffill().bfill()
  File "C:\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 572, in wrapper
    raise ValueError
ValueError

The text was updated successfully, but these errors were encountered:

jreback · 2015-04-22T21:05:20Z

pls supply an example which reproduces this or
show df.info() and df.head()
as well as pd.show_versions()

jreback · 2015-04-30T10:16:44Z

can you show a reproducible example and pd.show_versions()

arijun · 2015-04-30T14:45:20Z

To get a reproducible example I tried manually grouping:

for g in df.groupingColumn.unique():
    group = df[df.groupingColumn == g]
    group.groupby('groupingColumn').otherColumn.bfill()

but it didn't cause an exception when I did it like that.

I also tried to split it into smaller chunks to cut down the size but couldn't cut it much more than half the original size.

If you can think of another method to get a minimal reproducing DataFrame please let me know.

The output of pd.show_versions() was

INSTALLED VERSIONS
------------------
commit: None
python: 3.4.1.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 26 Stepping 5, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.16.0
nose: 1.3.3
Cython: 0.20.1
numpy: 1.9.2
scipy: 0.15.1
statsmodels: None
IPython: 2.1.0
sphinx: 1.2.2
patsy: 0.2.1
dateutil: 2.4.2
pytz: 2015.2
bottleneck: None
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.3.1
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: 0.8.0
xlsxwriter: 0.5.5
lxml: 3.3.5
bs4: 4.3.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 0.9.7
pymysql: 0.6.2.None
psycopg2: None

jreback · 2015-04-30T15:08:36Z

@arijun you need to provide a constructed dataframe and your code that repros e.g.

df = DataFrame(.......)
df.groupby(...)......

jorisvandenbossche · 2015-04-30T16:54:37Z

@arijun How big is your original dataframe? And may you share it?

Marigold · 2015-09-14T10:15:46Z

This produces @arijun error (using 0.16.2)

s = pd.Series([1,1,0], index=[1,2,2])
s.groupby(s).fillna(method='bfill')

It fails on calling reindex on series with duplicate index. I'm not that familiar with pandas codebase to submit a PR for this, but let me know if I can help in any other way.

TomAugspurger · 2018-01-31T15:18:45Z

Duplicate of #19437

jreback added Groupby Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Apr 30, 2015

jreback added the Bug label Sep 14, 2015

jreback added this to the Next Major Release milestone Sep 14, 2015

TomAugspurger added the Duplicate Report Duplicate issue or pull request label Jan 31, 2018

TomAugspurger modified the milestones: Next Major Release, No action Jan 31, 2018

TomAugspurger marked this as a duplicate of #19437 Jan 31, 2018

TomAugspurger closed this as completed Jan 31, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Value error on groupby bfill() #9970

Value error on groupby bfill() #9970

arijun commented Apr 22, 2015

jreback commented Apr 22, 2015

jreback commented Apr 30, 2015

arijun commented Apr 30, 2015

jreback commented Apr 30, 2015

jorisvandenbossche commented Apr 30, 2015

Marigold commented Sep 14, 2015

TomAugspurger commented Jan 31, 2018

Value error on groupby bfill() #9970

Value error on groupby bfill() #9970

Comments

arijun commented Apr 22, 2015

jreback commented Apr 22, 2015

jreback commented Apr 30, 2015

arijun commented Apr 30, 2015

jreback commented Apr 30, 2015

jorisvandenbossche commented Apr 30, 2015

Marigold commented Sep 14, 2015

TomAugspurger commented Jan 31, 2018