Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Localize Series when calling to_datetime with utc=True (#6415) #17109

Merged
merged 18 commits into from
Sep 1, 2017

Conversation

mroeschke
Copy link
Member

When _convert_listlike passes a np.array into a Series, the Series seems to initially be of naive datetime dtype. Localized Series if utc=True.

I made one minor change to an existing test that tz_localize and index when it originally set the index with a Series with pd.to_datetime(...utc=True).

@gfyoung gfyoung added Bug Timezones Timezone data dtype labels Jul 29, 2017
@@ -2130,7 +2130,7 @@ def test_set_index_datetime(self):
'2011-07-19 08:00:00', '2011-07-19 09:00:00'],
'value': range(6)})
df.index = pd.to_datetime(df.pop('datetime'), utc=True)
df.index = df.index.tz_localize('UTC').tz_convert('US/Pacific')
df.index = df.index.tz_convert('US/Pacific')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for future reference, can you explain here why you needed to change this test?

@@ -508,6 +508,10 @@ def _convert_listlike(arg, box, format, name=None, tz=tz):
from pandas import Series
values = _convert_listlike(arg._values, False, format)
result = Series(values, index=arg.index, name=arg.name)
if utc and isinstance(values, np.ndarray):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be inside convert_list_like

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to include this logic in convert_listlike as is.

._values is called on Series before it's passed into convert_listlike which turns the Series into a np.array, so convert_listlike doesn't "know" that a Series was the original argument.

Not sure why this was done originally, but I could have convert_listlike handle a Series directly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure how easy that is, but you could.

@@ -270,6 +270,32 @@ def test_to_datetime_utc_is_true(self):
expected = pd.DatetimeIndex(data=date_range)
tm.assert_index_equal(result, expected)

# GH 6415: UTC=True with Series
data = ['20100102 121314', '20100102 121315']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make a new test

data = ['20100102 121314', '20100102 121315']
result = pd.to_datetime(pd.Series(data),
format='%Y%m%d %H%M%S',
utc=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should work if box is True iow an Index or Series is passed

@@ -260,7 +260,7 @@ Conversion

- Bug in assignment against datetime-like data with ``int`` may incorrectly converte to datetime-like (:issue:`14145`)
- Bug in assignment against ``int64`` data with ``np.ndarray`` with ``float64`` dtype may keep ``int64`` dtype (:issue:`14001`)

- Bug in :func:`to_datetime` incorrectly localizing dates in `Series` when `utc=True` (:issue:`6415`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs a sub section as it's a major change

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What section in the whatsnew should I put this under? Enhancement? API breaking?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this an API breaking change. Enhancement implies it adds functionality that's beneficial and isn't patching existing behavior per se.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a major change that is API breaking.

@mroeschke mroeschke force-pushed the fix_6415 branch 2 times, most recently from 12b9c58 to e85263d Compare August 7, 2017 05:28
UTC Localization with Series
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Previously, :func:`to_datetime` did not localize datetime ``Series`` data as when ``utc=True`` was passed. Now, :func:`to_datetime`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just say what its going to do, this is redundant otherwise.

Previously, :func:`to_datetime` did not localize datetime ``Series`` data as when ``utc=True`` was passed. Now, :func:`to_datetime`
will correctly localize `Series` with a `datetime64[ns, UTC]` data type. (:issue:`6415`)

Old Behavior
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previous behaviour

@@ -359,7 +359,9 @@ def _convert_listlike(arg, box, format, name=None, tz=tz):
return DatetimeIndex(arg, tz=tz, name=name)
except ValueError:
pass

from pandas import Series
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should also work for a DTI.

use ABCSeries rather than importing.

I am not sure this is the best location for this.

@@ -379,11 +381,12 @@ def _convert_listlike(arg, box, format, name=None, tz=tz):
raise TypeError('arg must be a string, datetime, list, tuple, '
'1-d array, or Series')

arg = _ensure_object(arg)
obj_arg = _ensure_object(arg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you changing this?

Copy link
Member Author

@mroeschke mroeschke Aug 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I was trying to have _convert_listlike handle Series arguments directly, I was using the arg variable to keep track if a Series was initially passed so I know to use dt.tz_localize() after the dates have been parsed. arg = _ensure_object(arg) converts arg from a Series to a np.array so I no longer know what data structure was originally passed.

I guess I could create a boolean variable arg_is_seriesin the beginning and keep track of this information that way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you revert the obj_arg thing, hard to tell what's changing here

if is_datetime64_dtype(result) and box:
result = DatetimeIndex(result, tz=tz, name=name)
# GH 6415
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

too specific change here I think (see my comment above)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to make a wrapper function which handles the utc argument (IOW if its a DTI / Series and utc is true it will localize, otherwise it will pass thru).

@@ -270,6 +270,38 @@ def test_to_datetime_utc_is_true(self):
expected = pd.DatetimeIndex(data=date_range)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you didn't have to change any tests here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mentioned previously to move these tests to their own function. Is that what you were referring to?

UTC Localization with Series
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:func:`to_datetime` now correctly localizes datetime ``Series`` data as when ``utc=True`` was passed with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove correctly. localizes to UTC when utc=True is passed

if isinstance(arg, ABCSeries):
arg = arg.dt.tz_localize('UTC')
elif isinstance(arg, DatetimeIndex):
arg = arg.tz_convert(None).tz_localize('UTC')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, why are you converting to naive for a DTI?

@@ -379,11 +381,12 @@ def _convert_listlike(arg, box, format, name=None, tz=tz):
raise TypeError('arg must be a string, datetime, list, tuple, '
'1-d array, or Series')

arg = _ensure_object(arg)
obj_arg = _ensure_object(arg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you revert the obj_arg thing, hard to tell what's changing here

@mroeschke
Copy link
Member Author

I noticed that some IO SQL tests have been failing.

It appears that in the conversion from SQL to DataFrame the function _handle_date_column has pd.to_datetime(..., utc=True) hardcoded which is now localizing the columns to utc by default after this fix.

I feel we shouldn't assume these database dates are utc by default i.e. remove hardcoded utc=True. Thoughts?

@jorisvandenbossche
Copy link
Member

@mroeschke yeah, that sql issue might be a tricky issue, as I recall we had a long discussion about that. The utc=True was added in #7364.
It was done to deal with datetime objects with timezones coming from the database. The end result of using utc=True was that we always converted the timezone aware data to naive utc.

But, that also dates back from the time that a Series could not hold timezone aware data. So now, I think it would be OK (or even better) to have it as timezone aware but still UTC data (the reason that it has to be utc, if I recall, is that the database uses fixed offsets, which can lead to different timezones around DST, which results in errors when converting that to a datetime64 column. Therefore we decided to always convert timezone aware data to UTC)

So given that, it might be that we just need to update the tests?

@jorisvandenbossche jorisvandenbossche added this to the 0.21.0 milestone Aug 10, 2017
@jreback
Copy link
Contributor

jreback commented Aug 10, 2017

actually the db issue might be more complex and just changing tests is not the correct thing to do. See what is actually returned currently in the database. IIRC postgres stores a datetime-aware tz, but its actually in the local timezone, IOW it loses the offset info (it does convert to UTC from wherever it was, it just stores it locally). SO you need to reverse this on de-serialization.

@cpcloud

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Aug 10, 2017

@jreback looking at the issues I linked to (but it is a long time ago I actually looked at it ..), the data we get from postgres are datetime objects with pytz psycopg2 fixed offsets (so no actual timezone). So I think converting to UTC is, as we decided back then, still the most sensible thing to do (up to now, it only looses the 'utc time zone' as Series back then didn't support having a timezone. So now we can still have a utc data, but just preserve the fact it is timezone aware)

@jreback
Copy link
Contributor

jreback commented Aug 10, 2017

@jorisvandenbossche postgres loses information in the timezone. They are UTC dates, but in the local timezone. you need to match the existing behavior.

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Aug 10, 2017

you need to match the existing behavior.

Sorry, what do you mean? Do you mean existing behaviour of read_sql?
As I already said, read_sql already returns utc data for 'timestamp with time zone' data (so how postgres stores them). It now only converts it to "utc naive data" (as Series could not hold timezone information), which is due to the changes in this PR changes to "utc aware data", so the same values, but now with the timezone information.
IMO this change is not necessarily bad (I think it would have done that in #7364 if it had been possible at that time)

@mroeschke
Copy link
Member Author

I agree with @jorisvandenbossche

For reference, postgres timestamp with time zone is stored in UTC, and upon output:

it is always converted from UTC to the current timezone zone, and displayed as local time in that zone

But more importantly, the psycopg2 driver returns datetime objects with psycopg2.tz.FixedOffsetTimezone, and since this has historically been converted to UTC (naive), keeping it UTC sounds good to me (but now aware after this PR). I can add an additional note in the whatsnew

@pep8speaks
Copy link

pep8speaks commented Aug 15, 2017

Hello @mroeschke! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on September 01, 2017 at 07:00 Hours UTC

@codecov
Copy link

codecov bot commented Aug 15, 2017

Codecov Report

Merging #17109 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17109      +/-   ##
==========================================
- Coverage   91.03%   91.02%   -0.02%     
==========================================
  Files         163      163              
  Lines       49580    49581       +1     
==========================================
- Hits        45137    45130       -7     
- Misses       4443     4451       +8
Flag Coverage Δ
#multiple 88.8% <28.57%> (-0.01%) ⬇️
#single 40.26% <100%> (-0.06%) ⬇️
Impacted Files Coverage Δ
pandas/core/tools/datetimes.py 67.3% <100%> (+0.27%) ⬆️
pandas/io/sql.py 94.8% <100%> (ø) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.72% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 062f6f1...5eb7d9c. Read the comment docs.


pd.to_datetime(s, utc=True)

This new behavior will also localize datetime columns in DataFrames returned from :func:`read_sql` which previously returned datetime columns as naive UTC.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is would fighten me. you have to be expicit that this is only for columns that are already datetime tz-aware.

.. _whatsnew_0210.api.utc_localization_with_series:

UTC Localization with Series
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue number and provide a short expl.

if arg.tz is None:
arg = arg.tz_localize('UTC')
else:
arg = arg.tz_convert('UTC')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a BIG change here. we don't convert existing tz-aware timezones. why are you doing this?

if is_datetime64_dtype(result) and box:
result = DatetimeIndex(result, tz=tz, name=name)
# GH 6415
elif arg_is_series and utc:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is kludgy. pls find a better way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why I am doing this is because a Series is converted to a numpy array for the date conversion, and _convert_listlike needs an indicator to return back a Series.

The alternative is to have _convert_listlike return an array and have to_datetime do the conversion and localization. Which option is preferable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, name this is_series and do this outside of the convert_listlike function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need this if here, instead do it inside the maybe_convert_to_utc function

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think (didn't look in detail) that this kludgyness is coming from the fact that for a Series _convert_listlike now converts an array (box is to False when dealing with a Series), while for DatetimeIndex is returns a DatetimeIndex. And, an array cannot hold tz aware data, so the utc keyword needs to be handles separately for Series and DatetimeIndex, when using this current implementation.

But, can't we just remove the box=False for converting Series objects? And so unify the code for returning Series vs DatetimeIndex (conversion from DatetimeIndex to Series (instead of from array to Series) should be rather performant?)

Copy link
Member Author

@mroeschke mroeschke Aug 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche right idea but not directly related to box.

_convert_listlike can essentially be summarized in two operations:

  1. If the data already has a datetime dtype, take a short cut.
  2. Otherwise, make sure the data is of object dtype and try to parse.

In step 2, a Series is passed into _ensure_object which turns a Series into a numpy array with object dtype. Therefore, the kludge is due to attempting to have _convert_listlike "remember" that a Series was passed.

But sure, I can try to move this to the _maybe_convert_to_utc function

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, but for for step 2, why do you then need to have _convert_listlike handle Series vs arrays/lists of strings differently?
(As far as I can see, this is because for a Series the values are not converted to a DatetimeIndex at the end of _convert_listlike, and so the handling of utc has to be done afterwards again, but I can be wrong)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first fix for this issue was to have _convert_listlike just return an array and have to_datetime feed that array into a Series constructor and then localize to utc. To avoid leaking the conversion logic out of _convert_listlike, I am trying to have _convert_listlike return a localized Series outright.

So yes, for a Series the values are not converted to a DatetimeIndex (which can handle the localization with the tz arg) at the end of _convert_listlike, and when passing the values to a Series constructor, an extra step is need to localize the Series to utc i.e. Series(converted_array).dt.tz_localize('UTC')

Copy link
Member

@jorisvandenbossche jorisvandenbossche Aug 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So my suggestion is: can't we just also for Series return a DatetimeIndex (which already handles the utc keyword fine), and then afterwards just convert this to a Series as we now convert the array to a Series (then there is no need to either leak the conversion out of _convert_listlike or to deal with Series construction inside _convert_listlike).
(but it might be that there are other reasons for having it as an array that might break, I don't know)

@jreback jreback removed this from the 0.21.0 milestone Aug 15, 2017
@jreback jreback changed the title BUG: Localize Series when calling to_datetime with utc=True (#6415) API: Localize Series when calling to_datetime with utc=True (#6415) Aug 15, 2017
@jreback jreback added API Design and removed Bug labels Aug 15, 2017
utc=True)
expected = pd.Series(expected_data)
tm.assert_series_equal(result, expected)
result = pd.to_datetime(pd.Index(data),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use blank lines in between different sub-tests

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of duplicating lots of code, break this big tests into separate tests and use parametrize to deal with Series/Index converions

@@ -1319,14 +1323,15 @@ def check(col):
def test_date_parsing(self):
# No Parsing
df = sql.read_sql_table("types_test_data", self.conn)

# Now that GH 6415 is fixed, dates are automatically parsed to UTC
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

write these reversed, IOW

dates are automatically parsed to UTC (GH 6415). A future reader likely won't look at the reference unless the comment is not good enough., the comment needs to stand on its own.

@@ -1352,7 +1357,11 @@ def test_datetime(self):
# with read_table -> type information from schema used
result = sql.read_sql_table('test_datetime', self.conn)
result = result.drop('index', axis=1)
tm.assert_frame_equal(result, df)
# After GH 6415, dates outbound from a db will be localized to UTC
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if d.Af is localized to UTC (IOW before to_sql). does it work?

# comes back as datetime64
tm.assert_series_equal(res['a'], to_datetime(df['a']))
# GH 6415 comes back as datetime64[ns, UTC]
tm.assert_series_equal(res['a'], to_datetime(df['a'], utc=True))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use result, expected here, otherwise too hard to read

@jorisvandenbossche jorisvandenbossche added this to the 0.21.0 milestone Aug 15, 2017
@jorisvandenbossche
Copy link
Member

@mroeschke Based on the diff in the sql tests, it seems that you now have read_sql to return UTC aware data always (is this correct?), while IMO we should only return tz aware data when the SQL type is timestamp with time zone, and still return plain tz naive datetime data when the SQL type is just timestamp

@mroeschke
Copy link
Member Author

mroeschke commented Aug 16, 2017

@jorisvandenbossche Ah yes, you're right. I think I was too aggressive in changing the SQL tests.

I will have to dig deeper into SQL routines to see if I can determine the original SQL type and pass that info along to the _handle_date_column function which is now making datetime columns tz aware with to_datetime(..., utc=True)

@mroeschke
Copy link
Member Author

Addressed your comments @jorisvandenbossche and all green.

pandas/io/sql.py Outdated
@@ -821,7 +821,12 @@ def _harmonize_columns(self, parse_dates=None):

if (col_type is datetime or col_type is date or
col_type is DatetimeTZDtype):
self.frame[col_name] = _handle_date_column(df_col)
if col_type is DatetimeTZDtype:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just

utc = col_type is DatetimeTZDtype
self.frame[.....] = _handle_date_column(df_col, utc=utc)

UTC Localization with Series
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Previously, :func:`to_datetime` did not localize datetime ``Series`` data when ``utc=True`` was passed. Now, :func:`to_datetime` will correctly localize `Series` with a `datetime64[ns, UTC]` data type to be consistent with how list-like and Index data are handled. (:issue:`6415`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use double backticks around Index and Series

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also around the datetime64[ns, UTC]; otherwise these get lost in the text

say dtype instead of 'data type'

.astype('datetime64[ns, UTC]'))
col = expected.DateColWithTz
assert is_datetime64tz_dtype(col.dtype)
# Removed ".astype('datetime64[ns, UTC]')"after GH 6415 was fixed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this comment; you can explain why things are, but refering to the past is confusing

@@ -2130,7 +2130,8 @@ def test_set_index_datetime(self):
'2011-07-19 08:00:00', '2011-07-19 09:00:00'],
'value': range(6)})
df.index = pd.to_datetime(df.pop('datetime'), utc=True)
df.index = df.index.tz_localize('UTC').tz_convert('US/Pacific')
# Removed 'tz_localize('utc') below after GH 6415 was fixed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

[('2013-01-01 00:00:00-01:00', None),
('2013-01-01 00:00:00-01:00', 'datetime64[ns]'),
('2013-01-01 01:00:00', None),
('2013-01-01 01:00:00', 'datetime64[ns]')])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can write this a bit shorter by doing:

@pytest.mark.parametrize("data", ['2013-01-01 00:00:00-01:00', '2013-01-01 00:00:00-01:00'])
@pytest.mark.parametrize("dtype", [None, 'datetime64[ns]'])
def test...

then you get all combinations of the two automatically.

But I still don't fully understand the point of the test (or for the '2013-01-01 01:00:00' case. Isn't that covered by the test above?)

What is not tested above is a string that included tz offset, or a series with an already datetime64 dtype (both aware or naive) So I would just test those two cases? (and both also seem distinct, so would put them in different tests)

@mroeschke
Copy link
Member Author

Fixed the whatsnew and extraneous comments and created additional tests for tz-aware strings and Series with datetime[ns] dtypes.

I also stumbled upon this inconsistency when I was trying to add an additional test for 'datetime[ns, CET]' dtype:

In [3]: pd.Series(['2013-01-01 00:00:00'], dtype='datetime64[ns, CET]')
Out[3]: 
0   2013-01-01 01:00:00+01:00
dtype: datetime64[ns, CET]

In [4]: pd.Series([pd.Timestamp('2013-01-01 00:00:00', tz='CET')])
   ...: 
Out[4]: 
0   2013-01-01 00:00:00+01:00
dtype: datetime64[ns, CET]

I think these should be consistent and Out[4] should be the correct behavior. Thoughts? If so, I can add a new issue and maybe a test in this PR that will xfail due to this.

@jorisvandenbossche
Copy link
Member

I think all outstanding comments were addressed now, so time to finally get this merged :-)

I also stumbled upon this inconsistency when I was trying to add an additional test for 'datetime[ns, CET]' dtype:

I think Out[4] is certainly correct, for the Out[3]: I am not really sure.
(but I would certainly keep this for a separate issue / PR)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: to_datetime ignores utc=True when arg is Series
5 participants