-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: AttributeError: 'bool' object has no attribute 'to_numpy' in "mask_missing" method of core/missing.py #47101
Comments
Thanks @rainnnnny for the report. labelling as constructor issue as I believe that the >>> res._mgr.array
QList([ True, True, False, False]) as a workaround pandas.Series(
data=np.asarray(qlist([True, True, False, False], qtype=1, adjust_dtype=False))
).replace(False, numpy.NaN) gives
If there is a reason that you would want to retain the QList object in memory, should investigate creating an Extension Array https://pandas.pydata.org/pandas-docs/dev/development/extending.html#extensionarray |
FYI this is not only related to QList. new_mask ends up being [[False]] in case 2, but False in case 1. |
Thanks @bcmyguest1 for this example. It does raise a warning However, this example did work in pandas 1.3.5, so labelling as regression pending further investigation. import pandas as pd
# case 1
df = pd.DataFrame({"d": [pd.NA]})
print(df.replace("", pd.NA))
Note: the |
|
moving to 1.4.4 |
The reason why this goes wrong is that |
@jbrockmendel I've opened a PR #48313 but just checked for bool also instead (and also suppressed the numpy warnings while making changes here, see #47101 (comment)) |
removing milestone, xref #47485 (comment) |
I've run into this on pandas 2.0/2.1 as well on line 118 of missing.py,
It definitely feels like a bug here. This is what appears to be happening for me:
Generally, np.array() overrides the == operator to do this elementwise which is what is expected here. I think the hack above won't quite handle that enumeration although it'll at least not throw an exception. Probably it'd be best to check if arr is in fact iterable then convert to a numpy array or run the comparison elementwise at the point where we call arr == x since the fundamental assumption here as that arr performs that check on x elementwise and returns an array. |
I think replacing arr == x with np.equal(arr, x) might do the trick with relative ease. I think np.equal will check if either argument is an array/iterable and do that comparison elementwise. Fundamentally, in my case, I feel like perhaps the issue might also partially fall on qpython.qcollect.QList which claims to be an np.ndarray but doesn't implement the comparison elementwise. |
Bisecting two years ago ( pandas-dev#47101 (comment) ) shows this regression was introduced in b2d54d9 in 2021. Somehow this hasn't been patched since then. PR pandas-dev#48313 was supposed to address this, but the PR was closed and never merged and the bug has persisted.
* Fix issue pandas-dev#47101 Bisecting two years ago ( pandas-dev#47101 (comment) ) shows this regression was introduced in b2d54d9 in 2021. Somehow this hasn't been patched since then. PR pandas-dev#48313 was supposed to address this, but the PR was closed and never merged and the bug has persisted. * Add a test as per PR guidelines * Fix typo * Resolve inconsistent namespace as per PR test https://results.pre-commit.ci/run/github/858127/1723498369.6V12SWx7T-WpLZDAXXkz0Q This web UI commit will still fail, as the E501 line-too-long check will fail until the next commit * Resolve E501 linting errors https://results.pre-commit.ci/run/github/858127/1723498369.6V12SWx7T-WpLZDAXXkz0Q * Fix test TypeErrors np.equal([1,2,3], "") fails * Quote style for Ruff * typing and remove code backtick possibly incorrectly triggering ruff formatter * mpy supression for caught error * trailing space
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
new_mask = arr == x
this expected to get an array-like object with bool elements, like
[False, False, True, ...]
, but whenarr
being an special array-like object, like the qlist object in example , the expression returns a singleFalse
, which is a bool object, thus we'll get the exception shown aboveExpected Behavior
returns an array-like object instead of a bool object
Installed Versions
1.4.1
The text was updated successfully, but these errors were encountered: