Skip to content

Commit

Permalink
BUG: reindex with expansion and non-nanosecond dtype (pandas-dev#53505)
Browse files Browse the repository at this point in the history
* BUG: reindex with expansion and non-nanosecond dtype

* Restrict to timelike types

* Check earlier

* handle NA

* handle NA

* Better check
  • Loading branch information
mroeschke authored and root committed Jun 23, 2023
1 parent 14077f2 commit a514e86
Show file tree
Hide file tree
Showing 3 changed files with 24 additions and 2 deletions.
1 change: 1 addition & 0 deletions doc/source/whatsnew/v2.0.3.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ Fixed regressions
Bug fixes
~~~~~~~~~
- Bug in :func:`RangeIndex.union` when using ``sort=True`` with another :class:`RangeIndex` (:issue:`53490`)
- Bug in :func:`Series.reindex` when expanding a non-nanosecond datetime or timedelta :class:`Series` would not fill with ``NaT`` correctly (:issue:`53497`)
- Bug in :func:`read_csv` when defining ``dtype`` with ``bool[pyarrow]`` for the ``"c"`` and ``"python"`` engines (:issue:`53390`)
- Bug in :meth:`Series.str.split` and :meth:`Series.str.rsplit` with ``expand=True`` for :class:`ArrowDtype` with ``pyarrow.string`` (:issue:`53532`)
- Bug in indexing methods (e.g. :meth:`DataFrame.__getitem__`) where taking the entire :class:`DataFrame`/:class:`Series` would raise an ``OverflowError`` when Copy on Write was enabled and the length of the array was over the maximum size a 32-bit integer can hold (:issue:`53616`)
Expand Down
13 changes: 11 additions & 2 deletions pandas/core/dtypes/cast.py
Original file line number Diff line number Diff line change
Expand Up @@ -562,9 +562,16 @@ def maybe_promote(dtype: np.dtype, fill_value=np.nan):
If fill_value is a non-scalar and dtype is not object.
"""
orig = fill_value
orig_is_nat = False
if checknull(fill_value):
# https://github.com/pandas-dev/pandas/pull/39692#issuecomment-1441051740
# avoid cache misses with NaN/NaT values that are not singletons
if fill_value is not NA:
try:
orig_is_nat = np.isnat(fill_value)
except TypeError:
pass

fill_value = _canonical_nans.get(type(fill_value), fill_value)

# for performance, we are using a cached version of the actual implementation
Expand All @@ -580,8 +587,10 @@ def maybe_promote(dtype: np.dtype, fill_value=np.nan):
# if fill_value is not hashable (required for caching)
dtype, fill_value = _maybe_promote(dtype, fill_value)

if dtype == _dtype_obj and orig is not None:
# GH#51592 restore our potentially non-canonical fill_value
if (dtype == _dtype_obj and orig is not None) or (
orig_is_nat and np.datetime_data(orig)[0] != "ns"
):
# GH#51592,53497 restore our potentially non-canonical fill_value
fill_value = orig
return dtype, fill_value

Expand Down
12 changes: 12 additions & 0 deletions pandas/tests/series/methods/test_reindex.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
NaT,
Period,
PeriodIndex,
RangeIndex,
Series,
Timedelta,
Timestamp,
Expand Down Expand Up @@ -422,3 +423,14 @@ def test_reindexing_with_float64_NA_log():
result_log = np.log(s_reindex)
expected_log = Series([0, np.NaN, np.NaN], dtype=Float64Dtype())
tm.assert_series_equal(result_log, expected_log)


@pytest.mark.parametrize("dtype", ["timedelta64", "datetime64"])
def test_reindex_expand_nonnano_nat(dtype):
# GH 53497
ser = Series(np.array([1], dtype=f"{dtype}[s]"))
result = ser.reindex(RangeIndex(2))
expected = Series(
np.array([1, getattr(np, dtype)("nat", "s")], dtype=f"{dtype}[s]")
)
tm.assert_series_equal(result, expected)

0 comments on commit a514e86

Please sign in to comment.