Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROADMAP: add consistent missing values for all dtypes to the roadmap #35208

Merged
merged 3 commits into from
Aug 20, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions doc/source/development/roadmap.rst
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,32 @@ need to implement certain operations expected by pandas users (for example
the algorithm used in, ``Series.str.upper``). That work may be done outside of
pandas.

Consistent missing value handling
---------------------------------

Currently, pandas handles missing data differently for different data types. We
use different types to indicate that a value is missing (``np.nan`` for
floating-point data, ``np.nan`` or ``None`` for object-dtype data -- typically
strings or booleans -- with missing values, and ``pd.NaT`` for datetimelike
data). Integer data cannot store missing data or are cast to float. In addition,
pandas 1.0 introduced a new missing value sentinel, ``pd.NA``, which is being
used for the experimental nullable integer, boolean, and string data types.

These different missing values have different behaviors in user-facing
operations. Specifically, we introduced different semantics for the nullable
data types for certain operations (e.g. propagating in comparison operations
instead of comparing as False).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comparison operations are the only ones that come to mind. are there other examples im missing, or is it just that in principle there could be others?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also kleene-logic in logical operations, the boolean behaviour of the scalar value, and the behaviour with missing values in boolean indexing.


Long term, we want to introduce consistent missing data handling for all data
types. This includes consistent behavior in all operations (indexing, arithmetic
operations, comparisons, etc.). We want to eventually make the new semantics the
default.

This has been discussed at
`github #28095 <https://github.com/pandas-dev/pandas/issues/28095>`__ (and
linked issues), and described in more detail in this
`design doc <https://hackmd.io/@jorisvandenbossche/Sk0wMeAmB>`__.

Apache Arrow interoperability
-----------------------------

Expand Down