-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: GH25495 incorrect dtype when using .loc to set Categorical value for column in 1-row DataFrame #29393
BUG: GH25495 incorrect dtype when using .loc to set Categorical value for column in 1-row DataFrame #29393
Conversation
pandas/core/internals/blocks.py
Outdated
@@ -883,6 +883,14 @@ def setitem(self, indexer, value): | |||
): | |||
values[indexer] = value | |||
try: | |||
# GH25495 may need to convert to categorical block |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't put complicated things inside a try/except it tends to hide errors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could be much earlier in the comparisons
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've moved it out of the try/except.
However, the intended behaviour is that we only convert to categorical if it is a full assignment, and only change the values in a partial assignment. Hence the block creation needs to be done after we've checked the shape of the new values matches existing values.
…ndexing-1row-df * upstream/master: (109 commits) stronger typing in libreduction (pandas-dev#29502) API: rename labels to codes (pandas-dev#29509) CLN: remove unnecessary type checks (pandas-dev#29517) implement _BaseGrouper (pandas-dev#29520) CLN: F-string formatting in pandas/_libs/*.pyx (pandas-dev#29527) Fixed more SS03 errors (pandas-dev#29540) consolidate dim checks (pandas-dev#29536) REF: separate out _get_cython_func_and_vals (pandas-dev#29537) remove unnecessary exception (pandas-dev#29538) TST:Add test to check single category col returns series with single row slice (pandas-dev#29521) Make color validation more forgiving (pandas-dev#29122) DOC: update bottleneck repo and documentation urls (pandas-dev#29516) TST: add test for df construction from dict with tuples (pandas-dev#29497) add test for pd.melt dtypes preservation (pandas-dev#29510) updated DataFrame.equals docstring (pandas-dev#29496) Resolved merge conflicts (pandas-dev#29506) DOC: Improved pandas/compact/__init__.py (pandas-dev#29507) DOC: Update performance comparison section of io docs (pandas-dev#28890) TST: add test for df.where() with category dtype (pandas-dev#29454) DOC: Fix docs on merging categoricals. (pandas-dev#28185) ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. can you add a whatsnew note in bug fixes / indexing section. ping on green.
Thanks @jreback. Added a whatsnew note. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls merge master and move the test function, otherwise lgtm. ping on green.
pandas/tests/frame/test_indexing.py
Outdated
@@ -3849,6 +3849,16 @@ def test_assigning_ops(self): | |||
df.loc[2:3, "b"] = Categorical(["b", "b"], categories=["a", "b"]) | |||
tm.assert_frame_equal(df, exp) | |||
|
|||
def test_setitem_single_row_categorical(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move to pandas/tests/frame/indexing/test_indexing.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merged master, and moved to pandas/tests/frame/indexing/test_categorical.py. All tests passed @jreback
…ndexing-1row-df * upstream/master: (185 commits) ENH: add BooleanArray extension array (pandas-dev#29555) DOC: Add link to dev calendar and meeting notes (pandas-dev#29737) ENH: Add built-in function for Styler to format the text displayed for missing values (pandas-dev#29118) DEPR: remove statsmodels/seaborn compat shims (pandas-dev#29822) DEPR: remove Index.summary (pandas-dev#29807) DEPR: passing an int to read_excel use_cols (pandas-dev#29795) STY: fstrings in io.pytables (pandas-dev#29758) BUG: Fix melt with mixed int/str columns (pandas-dev#29792) TST: add test for ffill/bfill for non unique multilevel (pandas-dev#29763) Changed description of parse_dates in read_excel(). (pandas-dev#29796) BUG: pivot_table not returning correct type when margin=True and aggfunc='mean' (pandas-dev#28248) REF: Create _lib/window directory (pandas-dev#29817) Fixed small mistake (pandas-dev#29815) minor cleanups (pandas-dev#29798) DEPR: enforce deprecations in core.internals (pandas-dev#29723) add test for unused level raises KeyError (pandas-dev#29760) Add documentation linking to sqlalchemy (pandas-dev#29373) io/parsers: ensure decimal is str on PythonParser (pandas-dev#29743) Reenabled no-unused-function (pandas-dev#29767) CLN:F-string in pandas/_libs/tslibs/*.pyx (pandas-dev#29775) ... # Conflicts: # pandas/tests/frame/indexing/test_indexing.py
…gtan/pandas into bug/categorical-indexing-1row-df * 'bug/categorical-indexing-1row-df' of github.com:keechongtan/pandas: Update whatsnew entry
Hello @keechongtan! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-01-27 11:52:36 UTC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. can you merge master & ping on green.
…ndexing-1row-df * upstream/master: (32 commits) DEPR: Series.cat.categorical (pandas-dev#29914) DEPR: infer_dtype default for skipna is now True (pandas-dev#29876) Fix broken asv (pandas-dev#29906) DEPR: Remove weekday_name (pandas-dev#29831) Fix mypy errors for pandas\tests\series\test_operators.py (pandas-dev#29826) CI: Setting path only once in GitHub Actions (pandas-dev#29867) DEPR: passing td64 data to DTA or dt64 data to TDA (pandas-dev#29794) CLN: remove unsupported sparse code from io.pytables (pandas-dev#29863) x.__class__ TO type(x) (pandas-dev#29889) DEPR: ftype, ftypes (pandas-dev#29895) REF: use named funcs instead of lambdas (pandas-dev#29841) Correct type inference for UInt64Index during access (pandas-dev#29420) CLN: follow-up to 29725 (pandas-dev#29890) CLN: trim unnecessary code in indexing tests (pandas-dev#29845) TST added test for groupby agg on mulitlevel column (pandas-dev#29772) (pandas-dev#29866) mypy fix (pandas-dev#29891) Typing annotations (pandas-dev#29850) Fix mypy error in pandas/tests.indexes.test_base.py (pandas-dev#29188) CLN: remove never-used kwargs, make kwargs explicit (pandas-dev#29873) TYP: Added typing to __eq__ functions (pandas-dev#29818) ...
@jreback, merged master and all tests passed |
pandas/core/internals/blocks.py
Outdated
len(arr_value.shape) | ||
and arr_value.shape[0] == values.shape[0] | ||
and arr_value.size == values.size | ||
self.is_categorical_astype(arr_value.dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use is_categorical_dtype(arr_value.dtype)
is_categorical_astype is not necessary here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made the changes, and merged master. All tests passed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jreback, is the PR good to merge to master?
…ndexing-1row-df * upstream/master: (49 commits) repr() (pandas-dev#29959) DOC : Typo fix in userguide/Styling (pandas-dev#29956) CLN: small things in pytables (pandas-dev#29958) API/DEPR: Change default skipna behaviour + deprecate numeric_only in Categorical.min and max (pandas-dev#27929) DEPR: DTI/TDI/PI constructor arguments (pandas-dev#29930) CLN: fix pytables passing too many kwargs (pandas-dev#29951) Typing (pandas-dev#29947) repr() (pandas-dev#29948) repr() (pandas-dev#29950) Added space at the end of the sentence (pandas-dev#29949) ENH: add NA scalar for missing value indicator, use in StringArray. (pandas-dev#29597) CLN: BlockManager.apply (pandas-dev#29825) TST: add test for rolling max/min/mean with DatetimeIndex over different frequencies (pandas-dev#29932) CLN: explicit signature for to_hdf (pandas-dev#29939) CLN: make kwargs explicit for pytables read_ methods (pandas-dev#29935) Convert core/indexes/base.py to f-strings (pandas-dev#29903) DEPR: dropna multiple axes, fillna int for td64, from_codes with floats, Series.nonzero (pandas-dev#29875) CLN: make kwargs explicit in pytables constructors (pandas-dev#29936) DEPR: tz_convert in the Timestamp constructor raises (pandas-dev#29929) STY: F-strings and repr (pandas-dev#29938) ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small change, pls merge master. ping on green.
pandas/core/internals/blocks.py
Outdated
len(arr_value.shape) | ||
and arr_value.shape[0] == values.shape[0] | ||
and arr_value.size == values.size | ||
is_categorical_dtype(arr_value.dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you put exact_match as the first condition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback moved exact_match to be the first condition, and merged master, all tests passed.
…ndexing-1row-df * upstream/master: (333 commits) CI: troubleshoot Web_and_Docs failing (pandas-dev#30534) WARN: Ignore NumbaPerformanceWarning in test suite (pandas-dev#30525) DEPR: camelCase in offsets, get_offset (pandas-dev#30340) PERF: implement scalar ops blockwise (pandas-dev#29853) DEPR: Remove Series.compress (pandas-dev#30514) ENH: Add numba engine for rolling apply (pandas-dev#30151) [ENH] Add to_markdown method (pandas-dev#30350) DEPR: Deprecate pandas.np module (pandas-dev#30386) ENH: Add ignore_index for df.drop_duplicates (pandas-dev#30405) BUG: The setting xrot=0 in DataFrame.hist() doesn't work with by and subplots pandas-dev#30288 (pandas-dev#30491) CI: Fix GBQ Tests (pandas-dev#30478) Bug groupby quantile listlike q and int columns (pandas-dev#30485) ENH: Add ignore_index for df.sort_values and series.sort_values (pandas-dev#30402) TYP: Typing hints in pandas/io/formats/{css,csvs}.py (pandas-dev#30398) BUG: raise on non-hashable Index name, closes pandas-dev#29069 (pandas-dev#30335) Replace "foo!r" to "repr(foo)" syntax pandas-dev#29886 (pandas-dev#30502) BUG: preserve EA dtype in transpose (pandas-dev#30091) BLD: add check to prevent tempita name error, clsoes pandas-dev#28836 (pandas-dev#30498) REF/TST: method-specific files for test_append (pandas-dev#30503) marked unused parameters (pandas-dev#30504) ...
len(arr_value.shape) | ||
and arr_value.shape[0] == values.shape[0] | ||
and arr_value.size == values.size | ||
exact_match |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so i think its possible to greatly simplify this entire routine, but creating a Categorical.set_item. It would do this kind of test and if so just sets (like you have here), otherwise it can dispatch to the Block.set_item (this routine) for regular handling.
The logic in this routine (Block.set_item) then becomes a lot simpler.
now it might be slightly non-trivial to change this because .set_item does a lot, e.g. it preps the values AND then does the setting; these likely need to be decoupled to avoid duplicating some code.
lmk if this is doable and you want to work on it. ok with merging this if we can followup with the Categorical.set_item change (or just do that),.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jreback, sorry for the delay. I would prefer to merge this, and then follow up with the Categorical.set_item change. I'm happy on working on that as well.
Merged master, and updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
k let's do that
if you can merge master and update would be great |
…ndexing-1row-df * upstream/master: (284 commits) CLN: leftover ix checks (pandas-dev#30951) CI: numpydev changed double to single quote (pandas-dev#30952) DOC: Fix whatsnew contributors section (pandas-dev#30926) STY: wrong placed space in strings (pandas-dev#30940) TYP: type up parts of series.py (pandas-dev#30761) DOC: Fix SS03 docstring error (pandas-dev#30939) CLN: remove unnecesary _date_check_type (pandas-dev#30932) DOC: Fixture docs in pandas/conftest.py (pandas-dev#30917) CLN: F-strings (pandas-dev#30916) replace syntax with f-string (pandas-dev#30919) BUG: pickle files left behind by tm.round_trip_pickle (pandas-dev#30906) TYP: offsets (pandas-dev#30897) TYP: typing annotations (pandas-dev#30901) WEB: Remove from roadmap moving the docstring script (pandas-dev#30893) WEB: Removing Discourse links (pandas-dev#30890) DOC: Encourage use of pre-commit in the docs (pandas-dev#30864) DEPR: fix missing stacklevel in pandas.core.index deprecation (pandas-dev#30878) CLN: remove unnecessary overriding in subclasses (pandas-dev#30875) RLS: 1.0.0rc0 BUG: validate Index data is 1D + deprecate multi-dim indexing (pandas-dev#30588) ... # Conflicts: # doc/source/whatsnew/v1.0.0.rst
len(arr_value.shape) | ||
and arr_value.shape[0] == values.shape[0] | ||
and arr_value.size == values.size | ||
exact_match |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
k let's do that
@keechongtan if you'd rebase once more we can merge. |
ping when green. |
…ndexing-1row-df * upstream/master: (194 commits) DOC Remove Python 2 specific comments from documentation (pandas-dev#31198) Follow up PR: pandas-dev#28097 Simplify branch statement (pandas-dev#29243) BUG: DatetimeIndex.snap incorrectly setting freq (pandas-dev#31188) Move DataFrame.info() to live with similar functions (pandas-dev#31317) ENH: accept a dictionary in plot colors (pandas-dev#31071) PERF: add shortcut to Timestamp constructor (pandas-dev#30676) CLN/MAINT: Clean and annotate stata reader and writers (pandas-dev#31072) REF: define _get_slice_axis in correct classes (pandas-dev#31304) BUG: DataFrame.floordiv(ser, axis=0) not matching column-wise bheavior (pandas-dev#31271) PERF: optimize is_scalar, is_iterator (pandas-dev#31294) BUG: Series rolling count ignores min_periods (pandas-dev#30923) xfail sparse warning; closes pandas-dev#31310 (pandas-dev#31311) REF: DatetimeIndex.get_value wrap DTI.get_loc (pandas-dev#31314) CLN: internals.managers (pandas-dev#31316) PERF: avoid copies if possible in fill_binop (pandas-dev#31300) Add test for multiindex json (pandas-dev#31307) BUG: passing TDA and wrong freq to TimedeltaIndex (pandas-dev#31268) BUG: inconsistency between PeriodIndex.get_value vs get_loc (pandas-dev#31172) CLN: remove _set_subtyp (pandas-dev#31301) CI: Updated version of macos image (pandas-dev#31292) ...
thanks @keechongtan happy to have a followup as discussed (pls create an issue) |
@jreback seems like this wasn't backported. @meeseeksdev backport 1.0.x |
… .loc to set Categorical value for column in 1-row DataFrame
…et Categorical value for column in 1-row DataFrame (#31393) Co-authored-by: Kee Chong Tan <[email protected]>
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff