Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Last of the timezones funcs #17669

Merged
merged 14 commits into from
Sep 29, 2017
Merged

Conversation

jbrockmendel
Copy link
Member

Moves tools.datetimes._infer_tzinfo to tslibs.timezones._infer_tzinfo.

Refactor a large timezone-specific chunk of tslib.tz_localize_to_utc to tslibs.timezones._infer_dst

The new func timezones._infer_dst is taken out of tslib.tz_localize_to_utc. It is very nearly a cut/paste. The only change is removed a couple calls to Timestamp that are used for rendering an exception message, used np.datetime64 in its place.

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

refactor a large timezone-specific chunk of tslib.tz_localize_to_utc to tslibs.timezones._infer_dst
ndarray[int64_t] result_b):
cdef:
Py_ssize_t n = len(vals)
ndarray[int64_t] dst_hours
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sure u r typing s the original
there are lots of issues here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The typings should be identical. There are a couple of variables that do not have type declarations; those can be added.

def _infer_tzinfo(start, end):
def _infer(a, b):
tz = a.tzinfo
if b and b.tzinfo:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this actually used?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outside of tests, its used once in indexes.datetimes

@@ -275,3 +275,84 @@ cdef object get_dst_info(object tz):
dst_cache[cache_key] = (trans, deltas, typ)

return dst_cache[cache_key]


def _infer_tzinfo(start, end):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

de-privatize these

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k


def _infer_tzinfo(start, end):
def _infer(a, b):
tz = a.tzinfo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be an outside function

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to bother changing it, might as well get rid of _infer. It's only actually relevant in one case (of four)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

@@ -105,7 +105,7 @@ from tslibs.timezones cimport (
is_utc, is_tzlocal, is_fixed_offset,
treat_tz_as_dateutil, treat_tz_as_pytz,
get_timezone, get_utcoffset, maybe_get_tz,
get_dst_info
get_dst_info, _infer_dst
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's give this a more descriptive name: infer_dst_transitions

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure


def _infer_tzinfo(start, end):
def _infer(a, b):
tz = a.tzinfo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

return tz


cdef ndarray[int64_t] _infer_dst(ndarray[int64_t] vals,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are removing part of this function. I think its better to move almost all of it to timezones or leave it. (ok with latter for now, maybe do former at some point). Otherwise you end up splitting the logic / comments in 2 places.

I think this requires a bit more thought here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the appropriate block to separate out. It is exclusively focused on getting DST info for use in the original function. The rest of the function is thematically more related to _TSObject conversion.

In particular, pandas_datetimestruct doesn't belong anywhere near timezones.

@jreback jreback added the Timezones Timezone data dtype label Sep 25, 2017
@codecov
Copy link

codecov bot commented Sep 26, 2017

Codecov Report

Merging #17669 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17669      +/-   ##
==========================================
- Coverage   91.25%   91.23%   -0.02%     
==========================================
  Files         163      163              
  Lines       49808    49798      -10     
==========================================
- Hits        45454    45435      -19     
- Misses       4354     4363       +9
Flag Coverage Δ
#multiple 89.03% <100%> (-0.01%) ⬇️
#single 40.31% <100%> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/core/tools/datetimes.py 84.74% <100%> (-0.51%) ⬇️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.73% <0%> (-0.1%) ⬇️
pandas/core/indexes/base.py 96.29% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e0fe5cc...355cbe8. Read the comment docs.

@codecov
Copy link

codecov bot commented Sep 26, 2017

Codecov Report

Merging #17669 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17669      +/-   ##
==========================================
+ Coverage   91.24%   91.25%   +<.01%     
==========================================
  Files         163      163              
  Lines       49766    49770       +4     
==========================================
+ Hits        45411    45417       +6     
+ Misses       4355     4353       -2
Flag Coverage Δ
#multiple 89.04% <100%> (+0.02%) ⬆️
#single 40.33% <100%> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/core/tools/datetimes.py 83.03% <ø> (-0.77%) ⬇️
pandas/core/indexes/datetimes.py 95.53% <100%> (ø) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.73% <0%> (-0.1%) ⬇️
pandas/core/indexes/interval.py 92.85% <0%> (ø) ⬆️
pandas/core/sparse/array.py 91.58% <0%> (ø) ⬆️
pandas/compat/numpy/function.py 93.33% <0%> (+0.2%) ⬆️
pandas/plotting/_converter.py 65.2% <0%> (+1.81%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 074b485...ce701e5. Read the comment docs.

both_eq = result_a == result_b
trans_idx = np.squeeze(np.nonzero(np.logical_and(both_nat, ~both_eq)))
if trans_idx.size == 1:
stamp = Timestamp(vals[trans_idx])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert this routine.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just do the infer_tzinfo for now

@jreback jreback added this to the 0.21.0 milestone Sep 28, 2017
@jreback
Copy link
Contributor

jreback commented Sep 28, 2017

ok rebase, ping on green.

@jreback jreback merged commit 54f6648 into pandas-dev:master Sep 29, 2017
@jreback
Copy link
Contributor

jreback commented Sep 29, 2017

thanks

ghost pushed a commit to reef-technologies/pandas that referenced this pull request Oct 2, 2017
* 'master' of github.com:pandas-dev/pandas: (188 commits)
  Separate out _convert_datetime_to_tsobject (pandas-dev#17715)
  DOC: remove whatsnew note for xref pandas-dev#17131
  BUG: Regression in .loc accepting a boolean Index as an indexer (pandas-dev#17738)
  DEPR: Deprecate cdate_range and merge into bdate_range (pandas-dev#17691)
  CLN: replace %s syntax with .format in pandas.core: categorical, common, config, config_init (pandas-dev#17735)
  Fixed the memory usage explanation of categorical in gotchas from O(nm) to O(n+m) (pandas-dev#17736)
  TST: add backward compat for offset testing for pickles (pandas-dev#17733)
  remove unused time conversion funcs (pandas-dev#17711)
  DEPR: Deprecate convert parameter in take (pandas-dev#17352)
  BUG:Time Grouper bug fix when applied for list groupers (pandas-dev#17587)
  BUG: Fix some PeriodIndex resampling issues (pandas-dev#16153)
  BUG: Fix unexpected sort in groupby (pandas-dev#17621)
  DOC: Fixed typo in documentation for 'pandas.DataFrame.replace' (pandas-dev#17731)
  BUG: Fix series rename called with str altering name rather index (GH17407) (pandas-dev#17654)
  DOC: Add examples for MultiIndex.get_locs + cleanups (pandas-dev#17675)
  Doc improvements for IntervalIndex and Interval (pandas-dev#17714)
  BUG: DataFrame sort_values and multiple "by" columns fails to order NaT correctly
  Last of the timezones funcs (pandas-dev#17669)
  Add missing file to _pyxfiles, delete commented-out (pandas-dev#17712)
  update imports of DateParseError, remove unused imports from tslib (pandas-dev#17713)
  ...
@jbrockmendel jbrockmendel deleted the tslibs-timezones7 branch October 30, 2017 16:23
alanbato pushed a commit to alanbato/pandas that referenced this pull request Nov 10, 2017
No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants