Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: AttributeError: 'bool' object has no attribute 'to_numpy' in "mask_missing" method of core/missing.py #47101

Open
3 tasks done
rainnnnny opened this issue May 24, 2022 · 12 comments · May be fixed by #59766
Open
3 tasks done
Labels
Bug Constructors Series/DataFrame/Index/pd.array Constructors Regression Functionality that used to work in a prior pandas version replace replace method Subclassing Subclassing pandas objects

Comments

@rainnnnny
Copy link

rainnnnny commented May 24, 2022

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas
import numpy
from qpython.qcollection import qlist

res = pandas.Series(data=qlist([True, True, False, False], qtype=1, adjust_dtype=False)).replace(False, numpy.NaN)

Issue Description

image

image

new_mask = arr == x

this expected to get an array-like object with bool elements, like [False, False, True, ...] , but when arr being an special array-like object, like the qlist object in example , the expression returns a single False, which is a bool object, thus we'll get the exception shown above

Expected Behavior

returns an array-like object instead of a bool object

Installed Versions

1.4.1

@rainnnnny rainnnnny added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels May 24, 2022
@simonjayhawkins
Copy link
Member

Thanks @rainnnnny for the report.

labelling as constructor issue as I believe that the pd.Series constructor should be converting/materializing array-likes passed as data into standard numpy arrays for the in-memory storage and not storing the Qlist object directly.

>>> res._mgr.array
QList([ True,  True, False, False])

as a workaround

pandas.Series(
    data=np.asarray(qlist([True, True, False, False], qtype=1, adjust_dtype=False))
).replace(False, numpy.NaN)

gives

0    True
1    True
2     NaN
3     NaN
dtype: object

If there is a reason that you would want to retain the QList object in memory, should investigate creating an Extension Array https://pandas.pydata.org/pandas-docs/dev/development/extending.html#extensionarray

@simonjayhawkins simonjayhawkins added Constructors Series/DataFrame/Index/pd.array Constructors and removed Needs Triage Issue that has not been reviewed by a pandas team member labels May 24, 2022
@simonjayhawkins simonjayhawkins added this to the Contributions Welcome milestone May 24, 2022
@bcmyguest1
Copy link

bcmyguest1 commented May 26, 2022

import pandas as pd
# case 1
df = pd.DataFrame({"d":[pd.NA]})
df.replace('',pd.NA) # throws the same error
# case 2
df = pd.DataFrame({"d":[None]})
df.replace('',pd.NA) # works fine

FYI this is not only related to QList.

new_mask ends up being [[False]] in case 2, but False in case 1.

@simonjayhawkins
Copy link
Member

FYI this is not only related to QList.

Thanks @bcmyguest1 for this example.

It does raise a warning FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison, so I guess this issue will resolve itself in a future pandas.

However, this example did work in pandas 1.3.5, so labelling as regression pending further investigation.

import pandas as pd

# case 1
df = pd.DataFrame({"d": [pd.NA]})
print(df.replace("", pd.NA))
      d
0  <NA>

Note: the FutureWarning was also given in pandas 1.3.5, but I think that it should probably have been suppressed.

@simonjayhawkins simonjayhawkins changed the title BUG: type error in "mask_missing" method of core/missing.py BUG: AttributeError: 'bool' object has no attribute 'to_numpy' in "mask_missing" method of core/missing.py May 27, 2022
@simonjayhawkins simonjayhawkins added the Regression Functionality that used to work in a prior pandas version label May 27, 2022
@simonjayhawkins simonjayhawkins modified the milestones: Contributions Welcome, 1.4.3 May 27, 2022
@simonjayhawkins simonjayhawkins added the replace replace method label May 27, 2022
simonjayhawkins added a commit to simonjayhawkins/pandas that referenced this issue May 27, 2022
@simonjayhawkins
Copy link
Member

However, this example did work in pandas 1.3.5, so labelling as regression pending further investigation.

first bad commit: [b2d54d9] BUG: IntegerArray/FloatingArray ufunc with 'out' kwd (#45122)

cc @jbrockmendel

@jbrockmendel
Copy link
Member

  1. agreed with Simon that qlist shouldn't be held directly, or should be made an EA
  2. if it is made an EA, arr == x should be array-like
  3. we can change the check for ndarray to be specific to EA/BooleanArray
    3b) this goes in the pile of bugs created indirectly by pd.NA-workarounds

@simonjayhawkins
Copy link
Member

moving to 1.4.4

@simonjayhawkins simonjayhawkins modified the milestones: 1.4.3, 1.4.4 Jun 22, 2022
@CloseChoice
Copy link
Member

The reason why this goes wrong is that qlist inherits from np.ndarray therefore all our checks for this are passed and the qlist is stored directly. Maybe we could sanitize_array a bit to handle qlists correctly but maybe we can't get around an explicit cast to np.ndarray

@simonjayhawkins
Copy link
Member

simonjayhawkins commented Aug 30, 2022

3. we can change the check for ndarray to be specific to EA/BooleanArray
3b) this goes in the pile of bugs created indirectly by pd.NA-workarounds

@jbrockmendel I've opened a PR #48313 but just checked for bool also instead (and also suppressed the numpy warnings while making changes here, see #47101 (comment))

@simonjayhawkins
Copy link
Member

removing milestone, xref #47485 (comment)

@simonjayhawkins simonjayhawkins removed this from the 1.4.4 milestone Aug 30, 2022
@ancri
Copy link

ancri commented Aug 25, 2023

As an ugly workaround until this gets patched, this seemed to work for me, at pandas/core/missing.py:112

image

@ChrisMLikesMath
Copy link

I've run into this on pandas 2.0/2.1 as well on line 118 of missing.py,

new_mask = arr ==x
...
new_mask = new_mask.to_numpy(dtype=bool,na_value=False)

It definitely feels like a bug here. This is what appears to be happening for me:

  1. x is of type integer
  2. arr is of type list of integers (in my case a QList)
  3. The comparison arr == x checks if the integer is the list, or QList, which it is not.
  4. We expect an array but get a single bool since arr was never an array in the first place.

Generally, np.array() overrides the == operator to do this elementwise which is what is expected here.

I think the hack above won't quite handle that enumeration although it'll at least not throw an exception.

Probably it'd be best to check if arr is in fact iterable then convert to a numpy array or run the comparison elementwise at the point where we call arr == x since the fundamental assumption here as that arr performs that check on x elementwise and returns an array.

@mroeschke mroeschke added the Subclassing Subclassing pandas objects label Oct 20, 2023
@ChrisMLikesMath
Copy link

I think replacing arr == x with np.equal(arr, x) might do the trick with relative ease.

I think np.equal will check if either argument is an array/iterable and do that comparison elementwise.

Fundamentally, in my case, I feel like perhaps the issue might also partially fall on qpython.qcollect.QList which claims to be an np.ndarray but doesn't implement the comparison elementwise.

tigerhawkvok added a commit to tigerhawkvok/pandas that referenced this issue Aug 12, 2024
Bisecting two years ago ( pandas-dev#47101 (comment) ) shows this regression was introduced in b2d54d9 in 2021. Somehow this hasn't been patched since then.

PR pandas-dev#48313 was supposed to address this, but the PR was closed and never merged and the bug has persisted.
@tigerhawkvok tigerhawkvok mentioned this issue Aug 12, 2024
5 tasks
tigerhawkvok added a commit to tigerhawkvok/pandas that referenced this issue Sep 9, 2024
* Fix issue pandas-dev#47101

Bisecting two years ago ( pandas-dev#47101 (comment) ) shows this regression was introduced in b2d54d9 in 2021. Somehow this hasn't been patched since then.

PR pandas-dev#48313 was supposed to address this, but the PR was closed and never merged and the bug has persisted.

* Add a test as per PR guidelines

* Fix typo

* Resolve inconsistent namespace as per PR test

https://results.pre-commit.ci/run/github/858127/1723498369.6V12SWx7T-WpLZDAXXkz0Q

This web UI commit will still fail, as the E501 line-too-long check will fail until the next commit

* Resolve E501 linting errors

https://results.pre-commit.ci/run/github/858127/1723498369.6V12SWx7T-WpLZDAXXkz0Q

* Fix test TypeErrors

np.equal([1,2,3], "") fails

* Quote style for Ruff

* typing and remove code backtick possibly incorrectly triggering ruff formatter

* mpy supression for caught error

* trailing space
@tigerhawkvok tigerhawkvok linked a pull request Sep 9, 2024 that will close this issue
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Constructors Series/DataFrame/Index/pd.array Constructors Regression Functionality that used to work in a prior pandas version replace replace method Subclassing Subclassing pandas objects
Projects
None yet
8 participants