ENH: investigate using a bitarray as the mask in the nullable/masked ExtensionArrays #31293
Labels
Enhancement
ExtensionArray
Extending pandas with custom dtypes or arrays.
Missing-data
np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
NA - MaskedArrays
Related to pd.NA and nullable extension arrays
Needs Discussion
Requires discussion from core team before further action
Currently, our nullable / masked extension arrays (boolean, integer, for now) are using a numpy boolean array as their
_mask
to keep track of missing values. A potential route for improving memory and performance would be using a bitarray instead of a boolean numpy array (which is a byte per value).This should require some exploration: what are options how to implement this? (existing libraries, custom implementation) What is the performance impact? (some things like masking will also be slower, since we still rely on numpy for that, which needs boolean arrays) Is this worth it to do a custom implementation rather than using pyarrow for this? etc
The text was updated successfully, but these errors were encountered: