Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REF: Implement BaseMaskedArray class for integer/boolean ExtensionArrays #30789

Merged

Conversation

jorisvandenbossche
Copy link
Member

Todo item of #29556, consolidating common code for IntegerArray and BooleanArray.

This is only a start, there is more to share.

@WillAyd
Copy link
Member

WillAyd commented Jan 7, 2020

Thanks for starting this. Just out of curiosity did you ever consider implementing this as a Mixin? I suppose the advantage of that versus something like this would be it theoretically allows EA authors to compose masked extension types without having to worry about class hierarchies

@WillAyd WillAyd added the ExtensionArray Extending pandas with custom dtypes or arrays. label Jan 7, 2020
@jorisvandenbossche
Copy link
Member Author

For now, this is not supposed to be public (actually, I just didn't consider that yet, that can certainly be a discussion). For internal use, it doesn't really matter much? What would be the advantage of a mixin in that case?

@WillAyd
Copy link
Member

WillAyd commented Jan 7, 2020

For me (being less exposed to the EA stuff) a mixin just seemed a more natural way to add a mask. Let's see what others think

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far. Do you think you'll be able to share any of array_ufunc, setitem, and ops?

pandas/core/arrays/masked.py Show resolved Hide resolved
pandas/core/arrays/masked.py Show resolved Hide resolved
@TomAugspurger
Copy link
Contributor

For now, I think just straight inheritance from EA is best. With a mixin, we'd need to include a bunch of the EA interface anyway to get this to work (dtype, isna, etc.)

def take(self, indexer, allow_fill=False, fill_value=None):
# we always fill with 1 internally
# to avoid upcasting
data_fill_value = self._internal_fill_value if isna(fill_value) else fill_value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could do _validate_fill_value like DTA/TDA/PA to avoid allowing e.g. NaT here

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice @jorisvandenbossche can you comment on what is not just a straight cut/paste

@jorisvandenbossche
Copy link
Member Author

can you comment on what is not just a straight cut/paste

Everything is still a straight cut and paste, except for:

  • The _internal_fill_value class member, which is used in take
  • I edited the to_numy docstring a little bit to be generic for both datatypes

@jreback jreback added this to the 1.0 milestone Jan 8, 2020
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. small comments

import pandas.core.common as com
from pandas.core.indexers import check_bool_array_indexer

from .masked import BaseMaskedArray
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use absolute imports

from pandas.core.ops import invalid_comparison
from pandas.core.ops.common import unpack_zerodim_and_defer
from pandas.core.tools.numeric import to_numeric

from .masked import BaseMaskedArray
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

return cls(data, mask)

def take(self, indexer, allow_fill=False, fill_value=None):
# we always fill with 1 internally
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would remove / modify these comments

@TomAugspurger
Copy link
Contributor

This is only a start, there is more to share.

@jorisvandenbossche were you planning to move any more methods in this PR, or should we merge this and do more as followups? Something like _create_comparison_method may be sharable but will be a be tricky. Might warrant its own PR.

@jreback
Copy link
Contributor

jreback commented Jan 8, 2020

this ready?

@jreback jreback merged commit 89bc0aa into pandas-dev:master Jan 9, 2020
@jreback
Copy link
Contributor

jreback commented Jan 9, 2020

thanks @jorisvandenbossche

@jorisvandenbossche jorisvandenbossche deleted the NA-base-masked-array-class branch January 9, 2020 08:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants