BUG: pivot_table with margins=True fails for categorical dtype #10989

jakevdp · 2015-09-04T16:26:01Z

First, an example that works as expected (non-categorical):

In [22]: pd.__version__
Out[22]: '0.16.2'

In [23]: data = pd.DataFrame({'x': np.arange(99),
                     'y': np.arange(99) // 50,
                     'z': np.arange(99) % 3})

In [24]: data.pivot_table('x', 'y', 'z')
Out[24]: 
z     0     1     2
y                  
0  24.0  25.0  24.5
1  73.5  74.5  74.0

In [25]: data.pivot_table('x', 'y', 'z', margins=True)
Out[25]: 
z       0     1     2   All
y                          
0    24.0  25.0  24.5  24.5
1    73.5  74.5  74.0  74.0
All  48.0  49.0  50.0  49.0

Now convert y and z to categories; pivot table works without margins but fails with:

In [27]: data.y = data.y.astype('category')

In [28]: data.z = data.z.astype('category')

In [29]: data.pivot_table('x', 'y', 'z')
Out[29]: 
z     0     1     2
y                  
0  24.0  25.0  24.5
1  73.5  74.5  74.0

In [32]: data.pivot_table('x', 'y', 'z', margins=True)
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/Users/jakevdp/anaconda/envs/py3k/lib/python3.3/site-packages/pandas/core/internals.py in set(self, item, value, check)
   2979         try:
-> 2980             loc = self.items.get_loc(item)
   2981         except KeyError:

/Users/jakevdp/anaconda/envs/py3k/lib/python3.3/site-packages/pandas/core/index.py in get_loc(self, key, method)
   5072             key = tuple(map(_maybe_str_to_time_stamp, key, self.levels))
-> 5073             return self._engine.get_loc(key)
   5074 

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:3824)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:3704)()

pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12280)()

pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12231)()

KeyError: ('x', 'All')

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-32-7436e0e1c9bb> in <module>()
----> 1 data.pivot_table('x', 'y', 'z', margins=True)

/Users/jakevdp/anaconda/envs/py3k/lib/python3.3/site-packages/pandas/tools/pivot.py in pivot_table(data, values, index, columns, aggfunc, fill_value, margins, dropna)
    141     if margins:
    142         table = _add_margins(table, data, values, rows=index,
--> 143                              cols=columns, aggfunc=aggfunc)
    144 
    145     # discard the top level

/Users/jakevdp/anaconda/envs/py3k/lib/python3.3/site-packages/pandas/tools/pivot.py in _add_margins(table, data, values, rows, cols, aggfunc)
    167 
    168     if values:
--> 169         marginal_result_set = _generate_marginal_results(table, data, values, rows, cols, aggfunc, grand_margin)
    170         if not isinstance(marginal_result_set, tuple):
    171             return marginal_result_set

/Users/jakevdp/anaconda/envs/py3k/lib/python3.3/site-packages/pandas/tools/pivot.py in _generate_marginal_results(table, data, values, rows, cols, aggfunc, grand_margin)
    236                 # we are going to mutate this, so need to copy!
    237                 piece = piece.copy()
--> 238                 piece[all_key] = margin[key]
    239 
    240                 table_pieces.append(piece)

/Users/jakevdp/anaconda/envs/py3k/lib/python3.3/site-packages/pandas/core/frame.py in __setitem__(self, key, value)
   2125         else:
   2126             # set column
-> 2127             self._set_item(key, value)
   2128 
   2129     def _setitem_slice(self, key, value):

/Users/jakevdp/anaconda/envs/py3k/lib/python3.3/site-packages/pandas/core/frame.py in _set_item(self, key, value)
   2203         self._ensure_valid_index(value)
   2204         value = self._sanitize_column(key, value)
-> 2205         NDFrame._set_item(self, key, value)
   2206 
   2207         # check if we are modifying a copy

/Users/jakevdp/anaconda/envs/py3k/lib/python3.3/site-packages/pandas/core/generic.py in _set_item(self, key, value)
   1194 
   1195     def _set_item(self, key, value):
-> 1196         self._data.set(key, value)
   1197         self._clear_item_cache()
   1198 

/Users/jakevdp/anaconda/envs/py3k/lib/python3.3/site-packages/pandas/core/internals.py in set(self, item, value, check)
   2981         except KeyError:
   2982             # This item wasn't present, just insert at end
-> 2983             self.insert(len(self.items), item, value)
   2984             return
   2985 

/Users/jakevdp/anaconda/envs/py3k/lib/python3.3/site-packages/pandas/core/internals.py in insert(self, loc, item, value, allow_duplicates)
   3100             self._blknos = np.insert(self._blknos, loc, len(self.blocks))
   3101 
-> 3102         self.axes[0] = self.items.insert(loc, item)
   3103 
   3104         self.blocks += (block,)

/Users/jakevdp/anaconda/envs/py3k/lib/python3.3/site-packages/pandas/core/index.py in insert(self, loc, item)
   5583                 # other labels
   5584                 lev_loc = len(level)
-> 5585                 level = level.insert(lev_loc, k)
   5586             else:
   5587                 lev_loc = level.get_loc(k)

/Users/jakevdp/anaconda/envs/py3k/lib/python3.3/site-packages/pandas/core/index.py in insert(self, loc, item)
   3217         code = self.categories.get_indexer([item])
   3218         if (code == -1):
-> 3219             raise TypeError("cannot insert an item into a CategoricalIndex that is not already an existing category")
   3220 
   3221         codes = self.codes

TypeError: cannot insert an item into a CategoricalIndex that is not already an existing category

The text was updated successfully, but these errors were encountered:

jreback · 2015-09-04T16:30:59Z

prob related to #9534

jakevdp · 2015-09-04T17:36:32Z

I think this is different than #9534 – the issue here is that the table has a categorical columns and indices, so when you try to add a new "All" column/row, you get an error ("All" is not a valid category).

I see one of two fixes here: convert categorical indices to object indices, or create a new categorical index with a new valid category "All".

jreback · 2015-09-04T17:37:38Z

yeah, should create a new cat index and just add 'All'

TST: add test case from Issue pandas-dev#10989

jreback · 2015-10-19T11:39:44Z

replaced by #11371

jankatins · 2015-10-19T13:54:03Z

IMO the solution should be to change the categorical index to a object index, as happens with the integer index:

>>> data.pivot_table('x', 'y', 'z').index
Int64Index([0, 1], dtype='int64', name='y')
>>> data.pivot_table('x', 'y', 'z', margins=True).index
Index([0, 1, 'All'], dtype='object', name='y')

This includes updates to 3 Excel files, plus a test in test_excel.py, plus the fix in parsers.py issue when read_html with previous fix With read_html, the fix didn't work on Python 2.7. Handle the string conversion correctly Add bug fixed to what's new Revert "Add bug fixed to what's new" This reverts commit 05b2344. Revert "issue when read_html with previous fix" This reverts commit d1bc296. Add what's new to describe bug. fix issue with original fix Added text to describe the bug. Fixed issue so that it works correctly in Python 2.7 Add round trip test Added round trip test and fixed error in writing sheets when merge_cells=false and columns have multi index DEPR: deprecate pandas.io.ga, pandas-dev#11308 DEPR: deprecate engine keyword from to_csv pandas-dev#11274 remove warnings from the tests for deprecation of engine in to_csv PERF: Checking monotonic-ness before sorting on an index pandas-dev#11080 BUG: Bug in list-like indexing with a mixed-integer Index, pandas-dev#11320 Add hex color strings test CLN: GH11271 move _get_handle, UTF encoders to io.common TST: tests for list skiprows in read_excel BUG: Fix to_dict() problem when using only datetime pandas-dev#11247 Fix a bug where to_dict() does not return Timestamp when there is only datetime dtype present. Undo change for when columns are multiindex There is still something wrong here in the format of the file when there are multiindex columns, but that's for another day Fix formatting in test_excel and remove spurious test See title BUG: bug in comparisons vs tuples, pandas-dev#11339 bug#10442 : fix, adding note and test BUG pandas-dev#10442(test) : Convert datetimelike index to strings with astype(str) BUG#10422: note added bug#10442 : tests added bug#10442 : note udated BUG pandas-dev#10442(test) : Convert datetimelike index to strings with astype(str) bug#10442: fix, adding note and test bug#10442: fix, adding note and test Adjust test so that merge_cells=False works correctly Adjust the test so that if merge_cells=false, it does a proper formatting of the columns in the single row header, and puts the row header in the first row Fix test for Python 2.7 and 3.5 The test is failing on Python 2.7 and 3.5, which appears to read in the values as floats, and I cannot replicate. So force the tests to pass by just making the column names equal when merge_cells=False Fix for openpyxl < 2, and for issue pandas-dev#11408 If using openpyxl < 2, and value is a string that could be a number, force a string to be written out. If using openpyxl >= 2.2, then fix issue pandas-dev#11408 to do with merging cells Use set_value_explicit instead of set_explicit_value set_value_explicit is in openpyxl 1.6, changed in openpyxl 1.8, but there is code in 1.8 to set set_value_explicit to set_explicit_value for compatibility Add line in whatsnew for issue 11408 ENH: added capability to handle Path/LocalPath objects, pandas-dev#11033 DOC: typo in whatsnew/0.17.1.txt PERF: Release GIL on some datetime ops BUG: Bug in DataFrame.replace with a datetime64[ns, tz] and a non-compat to_replace pandas-dev#11326 CLN: clean up internal impl of fillna/replace, xref pandas-dev#11153 PERF: fast inf checking in to_excel PERF: Series.dropna with non-nan dtypes fixed pathlib tests on windows DEPR: remove some SparsePanel deprecation warnings in testing DEPR: avoid numpy comparison to None warnings API: indexing with a null key will raise a TypeError rather than a ValueError, pandas-dev#11356 WARN: elementwise comparisons with index names, xref pandas-dev#11162 DEPR warning in io/data.py w.r.t. order->sort_values WARN: more elementwise comparisons to object WARN: more uncomparables of numeric array vs object BUG: quick fix for pandas-dev#10989 TST: add test case from Issue pandas-dev#10989 API: add _to_safe_for_reshape to allow safe insert/append with embedded CategoricalIndexes Signed-off-by: Jeff Reback <[email protected]> BLD: conda Revert "BLD: conda" This reverts commit 0c8a8e1. TST: remove invalid symbol warnings TST: move some tests to slow TST: fix some warnings filters TST: import pandas_datareader, use for tests TST: remove some deprecation warnings from imports DEPR: fix VisibleDeprecationWarnings in sparse TST: remove some warnings in test_nanops ENH: Improve the error message in to_gbq when the DataFrame schema does not match pandas-dev#11359 add libgfortran to 1.8.1 build binstar -> anaconda remove link to issue 11328 in whatsnew Fixes to document issue in code, small efficiency fix Try to resolve rebase conflict in whats new

jakevdp changed the title ~~pivot_table with margins=True fails for categorical dtype~~ BUG: pivot_table with margins=True fails for categorical dtype Sep 4, 2015

jreback added Bug Prio-medium Reshaping Concat, Merge/Join, Stack/Unstack, Explode Categorical Categorical Data Type labels Sep 4, 2015

jreback added this to the 0.17.0 milestone Sep 4, 2015

jreback modified the milestones: Next Major Release, 0.17.0 Sep 4, 2015

jakevdp added a commit to jakevdp/pandas that referenced this issue Sep 4, 2015

BUG: quick fix for pandas-dev#10989

2b04d9f

jakevdp mentioned this issue Sep 4, 2015

BUG: pivot_table with margins=True fails for categorical dtype, #10989 #10993

Closed

jakevdp added a commit to jakevdp/pandas that referenced this issue Sep 4, 2015

TST: add test case from Issue pandas-dev#10989

74cac0e

jreback modified the milestones: 0.17.1, Next Major Release Oct 18, 2015

jreback pushed a commit to jreback/pandas that referenced this issue Oct 19, 2015

BUG: quick fix for pandas-dev#10989

d998337

TST: add test case from Issue pandas-dev#10989

jreback mentioned this issue Oct 19, 2015

BUG: pivot table bug with Categorical indexes, #10993 #11371

Merged

jreback closed this as completed Oct 19, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: pivot_table with margins=True fails for categorical dtype #10989

BUG: pivot_table with margins=True fails for categorical dtype #10989

jakevdp commented Sep 4, 2015

jreback commented Sep 4, 2015

jakevdp commented Sep 4, 2015

jreback commented Sep 4, 2015

jreback commented Oct 19, 2015

jankatins commented Oct 19, 2015

BUG: pivot_table with margins=True fails for categorical dtype #10989

BUG: pivot_table with margins=True fails for categorical dtype #10989

Comments

jakevdp commented Sep 4, 2015

jreback commented Sep 4, 2015

jakevdp commented Sep 4, 2015

jreback commented Sep 4, 2015

jreback commented Oct 19, 2015

jankatins commented Oct 19, 2015