Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Append DataFrame to Series with dateutil timezone #23685

Merged
merged 5 commits into from
Nov 15, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.24.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1372,6 +1372,7 @@ Reshaping
- Bug in :func:`pandas.concat` when concatenating a multicolumn DataFrame with tz-aware data against a DataFrame with a different number of columns (:issue:`22796`)
- Bug in :func:`merge_asof` where confusing error message raised when attempting to merge with missing values (:issue:`23189`)
- Bug in :meth:`DataFrame.nsmallest` and :meth:`DataFrame.nlargest` for dataframes that have a :class:`MultiIndex` for columns (:issue:`23033`).
- Bug in :meth:`DataFrame.append` with a :class:`Series` with a dateutil timezone would raise a ``TypeError`` (:issue:`23682`)

.. _whatsnew_0240.bug_fixes.sparse:

Expand Down
23 changes: 12 additions & 11 deletions pandas/_libs/lib.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -48,8 +48,7 @@ cdef extern from "src/parse_helper.h":
int floatify(object, float64_t *result, int *maybe_int) except -1

cimport util
from util cimport (is_nan,
UINT8_MAX, UINT64_MAX, INT64_MAX, INT64_MIN)
from util cimport is_nan, UINT64_MAX, INT64_MAX, INT64_MIN

from tslib import array_to_datetime
from tslibs.nattype cimport NPY_NAT
Expand Down Expand Up @@ -1642,20 +1641,22 @@ def is_datetime_with_singletz_array(values: ndarray) -> bool:

if n == 0:
return False

# Get a reference timezone to compare with the rest of the tzs in the array
for i in range(n):
base_val = values[i]
if base_val is not NaT:
base_tz = get_timezone(getattr(base_val, 'tzinfo', None))

for j in range(i, n):
val = values[j]
if val is not NaT:
tz = getattr(val, 'tzinfo', None)
if not tz_compare(base_tz, tz):
return False
break

for j in range(i, n):
TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved
# Compare val's timezone with the reference timezone
# NaT can coexist with tz-aware datetimes, so skip if encountered
val = values[j]
if val is not NaT:
tz = getattr(val, 'tzinfo', None)
if not tz_compare(base_tz, tz):
return False

return True


Expand Down Expand Up @@ -2045,7 +2046,7 @@ def maybe_convert_objects(ndarray[object] objects, bint try_float=0,

# we try to coerce datetime w/tz but must all have the same tz
if seen.datetimetz_:
if len({getattr(val, 'tzinfo', None) for val in objects}) == 1:
if is_datetime_with_singletz_array(objects):
from pandas import DatetimeIndex
return DatetimeIndex(objects)
seen.object_ = 1
Expand Down
15 changes: 15 additions & 0 deletions pandas/tests/reshape/test_concat.py
Original file line number Diff line number Diff line change
Expand Up @@ -1010,6 +1010,21 @@ def test_append_missing_column_proper_upcast(self, sort):
assert appended['A'].dtype == 'f8'
assert appended['B'].dtype == 'O'

def test_append_empty_frame_to_series_with_dateutil_tz(self):
# GH 23682
date = Timestamp('2018-10-24 07:30:00', tz=dateutil.tz.tzutc())
s = Series({'date': date, 'a': 1.0, 'b': 2.0})
df = DataFrame(columns=['c', 'd'])
result = df.append(s, ignore_index=True)
expected = DataFrame([[np.nan, np.nan, 1., 2., date]],
columns=['c', 'd', 'a', 'b', 'date'])
# These columns get cast to object after append
object_cols = ['c', 'd', 'date']
expected.loc[:, object_cols] = expected.loc[:, object_cols].astype(
object
)
assert_frame_equal(result, expected)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mroeschke what were the expected dtypes of result and expected here?

On master, everything is object, which may not have been intended.

On #24024, expected has datetime64[ns, tzutc()] dtype for the date column.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomAugspurger datetime64[ns, tzutc()] would be the correct dtype. I may have cast to object due to an append bug that existed prior. Feel free to change this test in #24024



class TestConcatenate(ConcatenateBase):

Expand Down