BUG: AttributeError: 'bool' object has no attribute 'to_numpy' in "mask_missing" method of core/missing.py #47101

rainnnnny · 2022-05-24T09:44:42Z

Pandas version checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas
import numpy
from qpython.qcollection import qlist

res = pandas.Series(data=qlist([True, True, False, False], qtype=1, adjust_dtype=False)).replace(False, numpy.NaN)

Issue Description

new_mask = arr == x

this expected to get an array-like object with bool elements, like [False, False, True, ...] , but when arr being an special array-like object, like the qlist object in example , the expression returns a single False, which is a bool object, thus we'll get the exception shown above

Expected Behavior

returns an array-like object instead of a bool object

Installed Versions

1.4.1

The text was updated successfully, but these errors were encountered:

simonjayhawkins · 2022-05-24T10:51:35Z

Thanks @rainnnnny for the report.

labelling as constructor issue as I believe that the pd.Series constructor should be converting/materializing array-likes passed as data into standard numpy arrays for the in-memory storage and not storing the Qlist object directly.

>>> res._mgr.array
QList([ True,  True, False, False])

as a workaround

pandas.Series(
    data=np.asarray(qlist([True, True, False, False], qtype=1, adjust_dtype=False))
).replace(False, numpy.NaN)

gives

0    True
1    True
2     NaN
3     NaN
dtype: object

If there is a reason that you would want to retain the QList object in memory, should investigate creating an Extension Array https://pandas.pydata.org/pandas-docs/dev/development/extending.html#extensionarray

bcmyguest1 · 2022-05-26T20:39:56Z

import pandas as pd
# case 1
df = pd.DataFrame({"d":[pd.NA]})
df.replace('',pd.NA) # throws the same error
# case 2
df = pd.DataFrame({"d":[None]})
df.replace('',pd.NA) # works fine

FYI this is not only related to QList.

new_mask ends up being [[False]] in case 2, but False in case 1.

simonjayhawkins · 2022-05-27T12:30:53Z

FYI this is not only related to QList.

Thanks @bcmyguest1 for this example.

It does raise a warning FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison, so I guess this issue will resolve itself in a future pandas.

However, this example did work in pandas 1.3.5, so labelling as regression pending further investigation.

import pandas as pd

# case 1
df = pd.DataFrame({"d": [pd.NA]})
print(df.replace("", pd.NA))

      d
0  <NA>

Note: the FutureWarning was also given in pandas 1.3.5, but I think that it should probably have been suppressed.

simonjayhawkins · 2022-05-27T13:12:48Z

However, this example did work in pandas 1.3.5, so labelling as regression pending further investigation.

first bad commit: [b2d54d9] BUG: IntegerArray/FloatingArray ufunc with 'out' kwd (#45122)

cc @jbrockmendel

jbrockmendel · 2022-05-27T16:31:40Z

agreed with Simon that qlist shouldn't be held directly, or should be made an EA
if it is made an EA, arr == x should be array-like
we can change the check for ndarray to be specific to EA/BooleanArray
3b) this goes in the pile of bugs created indirectly by pd.NA-workarounds

simonjayhawkins · 2022-06-22T13:57:28Z

moving to 1.4.4

CloseChoice · 2022-08-21T15:19:42Z

The reason why this goes wrong is that qlist inherits from np.ndarray therefore all our checks for this are passed and the qlist is stored directly. Maybe we could sanitize_array a bit to handle qlists correctly but maybe we can't get around an explicit cast to np.ndarray

simonjayhawkins · 2022-08-30T13:49:18Z

3. we can change the check for ndarray to be specific to EA/BooleanArray
3b) this goes in the pile of bugs created indirectly by pd.NA-workarounds

@jbrockmendel I've opened a PR #48313 but just checked for bool also instead (and also suppressed the numpy warnings while making changes here, see #47101 (comment))

simonjayhawkins · 2022-08-30T20:41:11Z

removing milestone, xref #47485 (comment)

ancri · 2023-08-25T21:23:48Z

As an ugly workaround until this gets patched, this seemed to work for me, at pandas/core/missing.py:112

ChrisMLikesMath · 2023-10-20T20:58:31Z

I've run into this on pandas 2.0/2.1 as well on line 118 of missing.py,

new_mask = arr ==x
...
new_mask = new_mask.to_numpy(dtype=bool,na_value=False)

It definitely feels like a bug here. This is what appears to be happening for me:

x is of type integer
arr is of type list of integers (in my case a QList)
The comparison arr == x checks if the integer is the list, or QList, which it is not.
We expect an array but get a single bool since arr was never an array in the first place.

Generally, np.array() overrides the == operator to do this elementwise which is what is expected here.

I think the hack above won't quite handle that enumeration although it'll at least not throw an exception.

Probably it'd be best to check if arr is in fact iterable then convert to a numpy array or run the comparison elementwise at the point where we call arr == x since the fundamental assumption here as that arr performs that check on x elementwise and returns an array.

ChrisMLikesMath · 2023-10-23T16:03:38Z

I think replacing arr == x with np.equal(arr, x) might do the trick with relative ease.

I think np.equal will check if either argument is an array/iterable and do that comparison elementwise.

Fundamentally, in my case, I feel like perhaps the issue might also partially fall on qpython.qcollect.QList which claims to be an np.ndarray but doesn't implement the comparison elementwise.

Bisecting two years ago ( pandas-dev#47101 (comment) ) shows this regression was introduced in b2d54d9 in 2021. Somehow this hasn't been patched since then. PR pandas-dev#48313 was supposed to address this, but the PR was closed and never merged and the bug has persisted.

* Fix issue pandas-dev#47101 Bisecting two years ago ( pandas-dev#47101 (comment) ) shows this regression was introduced in b2d54d9 in 2021. Somehow this hasn't been patched since then. PR pandas-dev#48313 was supposed to address this, but the PR was closed and never merged and the bug has persisted. * Add a test as per PR guidelines * Fix typo * Resolve inconsistent namespace as per PR test https://results.pre-commit.ci/run/github/858127/1723498369.6V12SWx7T-WpLZDAXXkz0Q This web UI commit will still fail, as the E501 line-too-long check will fail until the next commit * Resolve E501 linting errors https://results.pre-commit.ci/run/github/858127/1723498369.6V12SWx7T-WpLZDAXXkz0Q * Fix test TypeErrors np.equal([1,2,3], "") fails * Quote style for Ruff * typing and remove code backtick possibly incorrectly triggering ruff formatter * mpy supression for caught error * trailing space

rainnnnny added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels May 24, 2022

simonjayhawkins added Constructors Series/DataFrame/Index/pd.array Constructors and removed Needs Triage Issue that has not been reviewed by a pandas team member labels May 24, 2022

simonjayhawkins added this to the Contributions Welcome milestone May 24, 2022

simonjayhawkins changed the title ~~BUG: type error in "mask_missing" method of core/missing.py~~ BUG: AttributeError: 'bool' object has no attribute 'to_numpy' in "mask_missing" method of core/missing.py May 27, 2022

simonjayhawkins added the Regression Functionality that used to work in a prior pandas version label May 27, 2022

simonjayhawkins modified the milestones: Contributions Welcome, 1.4.3 May 27, 2022

simonjayhawkins added the replace replace method label May 27, 2022

simonjayhawkins added a commit to simonjayhawkins/pandas that referenced this issue May 27, 2022

code sample for pandas-dev#47101

389964d

simonjayhawkins modified the milestones: 1.4.3, 1.4.4 Jun 22, 2022

simonjayhawkins mentioned this issue Aug 30, 2022

REGR: AttributeError: 'bool' object has no attribute 'to_numpy' in "mask_missing" #48313

Closed

5 tasks

simonjayhawkins mentioned this issue Aug 30, 2022

RLS: 1.4.4 #47485

Closed

simonjayhawkins removed this from the 1.4.4 milestone Aug 30, 2022

mroeschke added the Subclassing Subclassing pandas objects label Oct 20, 2023

tigerhawkvok mentioned this issue Aug 12, 2024

Fix issue #47101 #59492

Closed

5 tasks

tigerhawkvok linked a pull request Sep 9, 2024 that will close this issue

Fix issue #47101 #59766

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: AttributeError: 'bool' object has no attribute 'to_numpy' in "mask_missing" method of core/missing.py #47101

BUG: AttributeError: 'bool' object has no attribute 'to_numpy' in "mask_missing" method of core/missing.py #47101

rainnnnny commented May 24, 2022 •

edited

Loading

simonjayhawkins commented May 24, 2022

bcmyguest1 commented May 26, 2022 •

edited

Loading

simonjayhawkins commented May 27, 2022

simonjayhawkins commented May 27, 2022

jbrockmendel commented May 27, 2022

simonjayhawkins commented Jun 22, 2022

CloseChoice commented Aug 21, 2022

simonjayhawkins commented Aug 30, 2022 •

edited

Loading

simonjayhawkins commented Aug 30, 2022

ancri commented Aug 25, 2023

ChrisMLikesMath commented Oct 20, 2023

ChrisMLikesMath commented Oct 23, 2023

BUG: AttributeError: 'bool' object has no attribute 'to_numpy' in "mask_missing" method of core/missing.py #47101

BUG: AttributeError: 'bool' object has no attribute 'to_numpy' in "mask_missing" method of core/missing.py #47101

Comments

rainnnnny commented May 24, 2022 • edited Loading

Pandas version checks

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

simonjayhawkins commented May 24, 2022

bcmyguest1 commented May 26, 2022 • edited Loading

simonjayhawkins commented May 27, 2022

simonjayhawkins commented May 27, 2022

jbrockmendel commented May 27, 2022

simonjayhawkins commented Jun 22, 2022

CloseChoice commented Aug 21, 2022

simonjayhawkins commented Aug 30, 2022 • edited Loading

simonjayhawkins commented Aug 30, 2022

ancri commented Aug 25, 2023

ChrisMLikesMath commented Oct 20, 2023

ChrisMLikesMath commented Oct 23, 2023

rainnnnny commented May 24, 2022 •

edited

Loading

bcmyguest1 commented May 26, 2022 •

edited

Loading

simonjayhawkins commented Aug 30, 2022 •

edited

Loading