Skip to content

Commit

Permalink
Backport PR #47372 on branch 1.4.x (REGR: revert behaviour change for…
Browse files Browse the repository at this point in the history
… concat with empty/all-NaN data) (#47472)

Backport PR #47372: REGR: revert behaviour change for concat with empty/all-NaN data

Co-authored-by: Joris Van den Bossche <[email protected]>
  • Loading branch information
simonjayhawkins and jorisvandenbossche authored Jun 22, 2022
1 parent 30a3b98 commit ad7dc56
Show file tree
Hide file tree
Showing 8 changed files with 383 additions and 140 deletions.
13 changes: 11 additions & 2 deletions doc/source/whatsnew/v1.4.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -271,6 +271,9 @@ the given ``dayfirst`` value when the value is a delimited date string (e.g.
Ignoring dtypes in concat with empty or all-NA columns
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. note::
This behaviour change has been reverted in pandas 1.4.3.

When using :func:`concat` to concatenate two or more :class:`DataFrame` objects,
if one of the DataFrames was empty or had all-NA values, its dtype was
*sometimes* ignored when finding the concatenated dtype. These are now
Expand Down Expand Up @@ -301,9 +304,15 @@ object, the ``np.nan`` is retained.

*New behavior*:

.. ipython:: python
.. code-block:: ipython
In [4]: res
Out[4]:
bar
0 2013-01-01 00:00:00
1 NaN
res
.. _whatsnew_140.notable_bug_fixes.value_counts_and_mode_do_not_coerce_to_nan:

Expand Down
11 changes: 11 additions & 0 deletions doc/source/whatsnew/v1.4.3.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,17 @@ including other versions of pandas.

.. ---------------------------------------------------------------------------
.. _whatsnew_143.concat:

Behaviour of ``concat`` with empty or all-NA DataFrame columns
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The behaviour change in version 1.4.0 to stop ignoring the data type
of empty or all-NA columns with float or object dtype in :func:`concat`
(:ref:`whatsnew_140.notable_bug_fixes.concat_with_empty_or_all_na`) has been
reverted (:issue:`45637`).


.. _whatsnew_143.regressions:

Fixed regressions
Expand Down
38 changes: 38 additions & 0 deletions pandas/core/dtypes/missing.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
import pandas._libs.missing as libmissing
from pandas._libs.tslibs import (
NaT,
Period,
iNaT,
)
from pandas._typing import (
Expand Down Expand Up @@ -668,3 +669,40 @@ def is_valid_na_for_dtype(obj, dtype: DtypeObj) -> bool:

# fallback, default to allowing NaN, None, NA, NaT
return not isinstance(obj, (np.datetime64, np.timedelta64, Decimal))


def isna_all(arr: ArrayLike) -> bool:
"""
Optimized equivalent to isna(arr).all()
"""
total_len = len(arr)

# Usually it's enough to check but a small fraction of values to see if
# a block is NOT null, chunks should help in such cases.
# parameters 1000 and 40 were chosen arbitrarily
chunk_len = max(total_len // 40, 1000)

dtype = arr.dtype
if dtype.kind == "f":
checker = nan_checker

elif dtype.kind in ["m", "M"] or dtype.type is Period:
# error: Incompatible types in assignment (expression has type
# "Callable[[Any], Any]", variable has type "ufunc")
checker = lambda x: np.asarray(x.view("i8")) == iNaT # type: ignore[assignment]

else:
# error: Incompatible types in assignment (expression has type "Callable[[Any],
# Any]", variable has type "ufunc")
checker = lambda x: _isna_array( # type: ignore[assignment]
x, inf_as_na=INF_AS_NA
)

return all(
# error: Argument 1 to "__call__" of "ufunc" has incompatible type
# "Union[ExtensionArray, Any]"; expected "Union[Union[int, float, complex, str,
# bytes, generic], Sequence[Union[int, float, complex, str, bytes, generic]],
# Sequence[Sequence[Any]], _SupportsArray]"
checker(arr[i : i + chunk_len]).all() # type: ignore[arg-type]
for i in range(0, total_len, chunk_len)
)
Loading

0 comments on commit ad7dc56

Please sign in to comment.