Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: GH25495 incorrect dtype when using .loc to set Categorical value for column in 1-row DataFrame #29393

Merged

Conversation

keechongtan
Copy link
Contributor

@keechongtan keechongtan commented Nov 4, 2019

@@ -883,6 +883,14 @@ def setitem(self, indexer, value):
):
values[indexer] = value
try:
# GH25495 may need to convert to categorical block
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't put complicated things inside a try/except it tends to hide errors

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could be much earlier in the comparisons

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved it out of the try/except.

However, the intended behaviour is that we only convert to categorical if it is a full assignment, and only change the values in a partial assignment. Hence the block creation needs to be done after we've checked the shape of the new values matches existing values.

@jreback jreback added Bug Categorical Categorical Data Type Indexing Related to indexing on series/frames, not to indexes themselves labels Nov 4, 2019
pandas/core/internals/blocks.py Outdated Show resolved Hide resolved
pandas/tests/frame/test_indexing.py Outdated Show resolved Hide resolved
pandas/tests/frame/test_indexing.py Outdated Show resolved Hide resolved
…ndexing-1row-df

* upstream/master: (109 commits)
  stronger typing in libreduction (pandas-dev#29502)
  API: rename labels to codes (pandas-dev#29509)
  CLN: remove unnecessary type checks (pandas-dev#29517)
  implement _BaseGrouper (pandas-dev#29520)
  CLN: F-string formatting in pandas/_libs/*.pyx (pandas-dev#29527)
  Fixed more SS03 errors (pandas-dev#29540)
  consolidate dim checks (pandas-dev#29536)
  REF: separate out _get_cython_func_and_vals (pandas-dev#29537)
  remove unnecessary exception (pandas-dev#29538)
  TST:Add test to check single category col returns series with single row slice (pandas-dev#29521)
  Make color validation more forgiving (pandas-dev#29122)
  DOC: update bottleneck repo and documentation urls (pandas-dev#29516)
  TST: add test for df construction from dict with tuples (pandas-dev#29497)
  add test for pd.melt dtypes preservation (pandas-dev#29510)
  updated DataFrame.equals docstring (pandas-dev#29496)
  Resolved merge conflicts (pandas-dev#29506)
  DOC: Improved pandas/compact/__init__.py (pandas-dev#29507)
  DOC: Update performance comparison section of io docs (pandas-dev#28890)
  TST: add test for df.where() with category dtype (pandas-dev#29454)
  DOC: Fix docs on merging categoricals. (pandas-dev#28185)
  ...
@jreback jreback added this to the 1.0 milestone Nov 16, 2019
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. can you add a whatsnew note in bug fixes / indexing section. ping on green.

@keechongtan
Copy link
Contributor Author

lgtm. can you add a whatsnew note in bug fixes / indexing section. ping on green.

Thanks @jreback. Added a whatsnew note.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls merge master and move the test function, otherwise lgtm. ping on green.

@@ -3849,6 +3849,16 @@ def test_assigning_ops(self):
df.loc[2:3, "b"] = Categorical(["b", "b"], categories=["a", "b"])
tm.assert_frame_equal(df, exp)

def test_setitem_single_row_categorical(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move to pandas/tests/frame/indexing/test_indexing.py

Copy link
Contributor Author

@keechongtan keechongtan Nov 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merged master, and moved to pandas/tests/frame/indexing/test_categorical.py. All tests passed @jreback

…ndexing-1row-df

* upstream/master: (185 commits)
  ENH: add BooleanArray extension array (pandas-dev#29555)
  DOC: Add link to dev calendar and meeting notes (pandas-dev#29737)
  ENH: Add built-in function for Styler to format the text displayed for missing values (pandas-dev#29118)
  DEPR: remove statsmodels/seaborn compat shims (pandas-dev#29822)
  DEPR: remove Index.summary (pandas-dev#29807)
  DEPR: passing an int to read_excel use_cols (pandas-dev#29795)
  STY: fstrings in io.pytables (pandas-dev#29758)
  BUG: Fix melt with mixed int/str columns (pandas-dev#29792)
  TST: add test for ffill/bfill for non unique multilevel (pandas-dev#29763)
  Changed description of parse_dates in read_excel(). (pandas-dev#29796)
  BUG: pivot_table not returning correct type when margin=True and aggfunc='mean'  (pandas-dev#28248)
  REF: Create _lib/window directory (pandas-dev#29817)
  Fixed small mistake (pandas-dev#29815)
  minor cleanups (pandas-dev#29798)
  DEPR: enforce deprecations in core.internals (pandas-dev#29723)
  add test for unused level raises KeyError (pandas-dev#29760)
  Add documentation linking to sqlalchemy (pandas-dev#29373)
  io/parsers: ensure decimal is str on PythonParser (pandas-dev#29743)
  Reenabled no-unused-function (pandas-dev#29767)
  CLN:F-string in pandas/_libs/tslibs/*.pyx (pandas-dev#29775)
  ...

# Conflicts:
#	pandas/tests/frame/indexing/test_indexing.py
…gtan/pandas into bug/categorical-indexing-1row-df

* 'bug/categorical-indexing-1row-df' of github.com:keechongtan/pandas:
  Update whatsnew entry
@pep8speaks
Copy link

pep8speaks commented Nov 25, 2019

Hello @keechongtan! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-01-27 11:52:36 UTC

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. can you merge master & ping on green.

…ndexing-1row-df

* upstream/master: (32 commits)
  DEPR: Series.cat.categorical (pandas-dev#29914)
  DEPR: infer_dtype default for skipna is now True (pandas-dev#29876)
  Fix broken asv (pandas-dev#29906)
  DEPR: Remove weekday_name (pandas-dev#29831)
  Fix mypy errors for pandas\tests\series\test_operators.py (pandas-dev#29826)
  CI: Setting path only once in GitHub Actions (pandas-dev#29867)
  DEPR: passing td64 data to DTA or dt64 data to TDA (pandas-dev#29794)
  CLN: remove unsupported sparse code from io.pytables (pandas-dev#29863)
  x.__class__ TO type(x) (pandas-dev#29889)
  DEPR: ftype, ftypes (pandas-dev#29895)
  REF: use named funcs instead of lambdas (pandas-dev#29841)
  Correct type inference for UInt64Index during access (pandas-dev#29420)
  CLN: follow-up to 29725 (pandas-dev#29890)
  CLN: trim unnecessary code in indexing tests (pandas-dev#29845)
  TST added test for groupby agg on mulitlevel column (pandas-dev#29772) (pandas-dev#29866)
  mypy fix (pandas-dev#29891)
  Typing annotations (pandas-dev#29850)
  Fix mypy error in pandas/tests.indexes.test_base.py (pandas-dev#29188)
  CLN: remove never-used kwargs, make kwargs explicit (pandas-dev#29873)
  TYP: Added typing to __eq__ functions (pandas-dev#29818)
  ...
@keechongtan
Copy link
Contributor Author

lgtm. can you merge master & ping on green.

@jreback, merged master and all tests passed

len(arr_value.shape)
and arr_value.shape[0] == values.shape[0]
and arr_value.size == values.size
self.is_categorical_astype(arr_value.dtype)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use is_categorical_dtype(arr_value.dtype)

is_categorical_astype is not necessary here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the changes, and merged master. All tests passed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jreback, is the PR good to merge to master?

…ndexing-1row-df

* upstream/master: (49 commits)
  repr() (pandas-dev#29959)
  DOC : Typo fix in userguide/Styling (pandas-dev#29956)
  CLN: small things in pytables (pandas-dev#29958)
  API/DEPR: Change default skipna behaviour + deprecate numeric_only in Categorical.min and max (pandas-dev#27929)
  DEPR: DTI/TDI/PI constructor arguments (pandas-dev#29930)
  CLN: fix pytables passing too many kwargs (pandas-dev#29951)
  Typing (pandas-dev#29947)
  repr() (pandas-dev#29948)
  repr() (pandas-dev#29950)
  Added space at the end of the sentence (pandas-dev#29949)
  ENH: add NA scalar for missing value indicator, use in StringArray. (pandas-dev#29597)
  CLN: BlockManager.apply (pandas-dev#29825)
  TST: add test for rolling max/min/mean with DatetimeIndex over different frequencies (pandas-dev#29932)
  CLN: explicit signature for to_hdf (pandas-dev#29939)
  CLN: make kwargs explicit for pytables read_ methods (pandas-dev#29935)
  Convert core/indexes/base.py to f-strings (pandas-dev#29903)
  DEPR: dropna multiple axes, fillna int for td64, from_codes with floats, Series.nonzero (pandas-dev#29875)
  CLN: make kwargs explicit in pytables constructors (pandas-dev#29936)
  DEPR: tz_convert in the Timestamp constructor raises (pandas-dev#29929)
  STY: F-strings and repr (pandas-dev#29938)
  ...
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small change, pls merge master. ping on green.

len(arr_value.shape)
and arr_value.shape[0] == values.shape[0]
and arr_value.size == values.size
is_categorical_dtype(arr_value.dtype)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you put exact_match as the first condition

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback moved exact_match to be the first condition, and merged master, all tests passed.

…ndexing-1row-df

* upstream/master: (333 commits)
  CI: troubleshoot Web_and_Docs failing (pandas-dev#30534)
  WARN: Ignore NumbaPerformanceWarning in test suite (pandas-dev#30525)
  DEPR: camelCase in offsets, get_offset (pandas-dev#30340)
  PERF: implement scalar ops blockwise (pandas-dev#29853)
  DEPR: Remove Series.compress (pandas-dev#30514)
  ENH: Add numba engine for rolling apply (pandas-dev#30151)
  [ENH] Add to_markdown method (pandas-dev#30350)
  DEPR: Deprecate pandas.np module (pandas-dev#30386)
  ENH: Add ignore_index for df.drop_duplicates (pandas-dev#30405)
  BUG: The setting xrot=0 in DataFrame.hist() doesn't work with by and subplots pandas-dev#30288 (pandas-dev#30491)
  CI: Fix GBQ Tests (pandas-dev#30478)
  Bug groupby quantile listlike q and int columns (pandas-dev#30485)
  ENH: Add ignore_index for df.sort_values and series.sort_values (pandas-dev#30402)
  TYP: Typing hints in pandas/io/formats/{css,csvs}.py (pandas-dev#30398)
  BUG: raise on non-hashable Index name, closes pandas-dev#29069 (pandas-dev#30335)
  Replace "foo!r" to "repr(foo)" syntax pandas-dev#29886 (pandas-dev#30502)
  BUG: preserve EA dtype in transpose (pandas-dev#30091)
  BLD: add check to prevent tempita name error, clsoes pandas-dev#28836 (pandas-dev#30498)
  REF/TST: method-specific files for test_append (pandas-dev#30503)
  marked unused parameters (pandas-dev#30504)
  ...
len(arr_value.shape)
and arr_value.shape[0] == values.shape[0]
and arr_value.size == values.size
exact_match
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so i think its possible to greatly simplify this entire routine, but creating a Categorical.set_item. It would do this kind of test and if so just sets (like you have here), otherwise it can dispatch to the Block.set_item (this routine) for regular handling.

The logic in this routine (Block.set_item) then becomes a lot simpler.

now it might be slightly non-trivial to change this because .set_item does a lot, e.g. it preps the values AND then does the setting; these likely need to be decoupled to avoid duplicating some code.

lmk if this is doable and you want to work on it. ok with merging this if we can followup with the Categorical.set_item change (or just do that),.

Copy link
Contributor Author

@keechongtan keechongtan Jan 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jreback, sorry for the delay. I would prefer to merge this, and then follow up with the Categorical.set_item change. I'm happy on working on that as well.

Merged master, and updated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k let's do that

@jreback jreback removed this from the 1.0 milestone Jan 3, 2020
@jreback
Copy link
Contributor

jreback commented Jan 9, 2020

if you can merge master and update would be great

…ndexing-1row-df

* upstream/master: (284 commits)
  CLN: leftover ix checks (pandas-dev#30951)
  CI: numpydev changed double to single quote (pandas-dev#30952)
  DOC: Fix whatsnew contributors section (pandas-dev#30926)
  STY: wrong placed space in strings (pandas-dev#30940)
  TYP: type up parts of series.py (pandas-dev#30761)
  DOC: Fix SS03 docstring error (pandas-dev#30939)
  CLN: remove unnecesary _date_check_type (pandas-dev#30932)
  DOC: Fixture docs in pandas/conftest.py (pandas-dev#30917)
  CLN: F-strings (pandas-dev#30916)
  replace syntax with f-string (pandas-dev#30919)
  BUG: pickle files left behind by tm.round_trip_pickle (pandas-dev#30906)
  TYP: offsets (pandas-dev#30897)
  TYP: typing annotations (pandas-dev#30901)
  WEB: Remove from roadmap moving the docstring script (pandas-dev#30893)
  WEB: Removing Discourse links (pandas-dev#30890)
  DOC: Encourage use of pre-commit in the docs (pandas-dev#30864)
  DEPR: fix missing stacklevel in pandas.core.index deprecation (pandas-dev#30878)
  CLN: remove unnecessary overriding in subclasses (pandas-dev#30875)
  RLS: 1.0.0rc0
  BUG: validate Index data is 1D + deprecate multi-dim indexing (pandas-dev#30588)
  ...

# Conflicts:
#	doc/source/whatsnew/v1.0.0.rst
@jreback jreback modified the milestones: 1.1, 1.0.0 Jan 26, 2020
len(arr_value.shape)
and arr_value.shape[0] == values.shape[0]
and arr_value.size == values.size
exact_match
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k let's do that

@jreback
Copy link
Contributor

jreback commented Jan 26, 2020

@keechongtan if you'd rebase once more we can merge.

@jreback
Copy link
Contributor

jreback commented Jan 26, 2020

ping when green.

…ndexing-1row-df

* upstream/master: (194 commits)
  DOC Remove Python 2 specific comments from documentation (pandas-dev#31198)
  Follow up PR: pandas-dev#28097 Simplify branch statement (pandas-dev#29243)
  BUG: DatetimeIndex.snap incorrectly setting freq (pandas-dev#31188)
  Move DataFrame.info() to live with similar functions (pandas-dev#31317)
  ENH: accept a dictionary in plot colors (pandas-dev#31071)
  PERF: add shortcut to Timestamp constructor (pandas-dev#30676)
  CLN/MAINT: Clean and annotate stata reader and writers (pandas-dev#31072)
  REF: define _get_slice_axis in correct classes (pandas-dev#31304)
  BUG: DataFrame.floordiv(ser, axis=0) not matching column-wise bheavior (pandas-dev#31271)
  PERF: optimize is_scalar, is_iterator (pandas-dev#31294)
  BUG: Series rolling count ignores min_periods (pandas-dev#30923)
  xfail sparse warning; closes pandas-dev#31310 (pandas-dev#31311)
  REF: DatetimeIndex.get_value wrap DTI.get_loc (pandas-dev#31314)
  CLN: internals.managers (pandas-dev#31316)
  PERF: avoid copies if possible in fill_binop (pandas-dev#31300)
  Add test for multiindex json (pandas-dev#31307)
  BUG: passing TDA and wrong freq to TimedeltaIndex (pandas-dev#31268)
  BUG: inconsistency between PeriodIndex.get_value vs get_loc (pandas-dev#31172)
  CLN: remove _set_subtyp (pandas-dev#31301)
  CI: Updated version of macos image (pandas-dev#31292)
  ...
@jreback jreback merged commit 0c50950 into pandas-dev:master Jan 27, 2020
@jreback
Copy link
Contributor

jreback commented Jan 27, 2020

thanks @keechongtan

happy to have a followup as discussed (pls create an issue)

@TomAugspurger
Copy link
Contributor

@jreback seems like this wasn't backported.

@meeseeksdev backport 1.0.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Categorical Categorical Data Type Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants