Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove duplicated logic from period_helper #19540

Merged
merged 10 commits into from
Feb 8, 2018

Conversation

jbrockmendel
Copy link
Member

Following on the heels of #19534, this removes a bunch of duplicate logic from period_helper and uses the implementations in np_datetime and ccalendar instead. A handful of functions are moved directly into period, but for the most part it's deletion.

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@codecov
Copy link

codecov bot commented Feb 5, 2018

Codecov Report

Merging #19540 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #19540   +/-   ##
=======================================
  Coverage   91.62%   91.62%           
=======================================
  Files         150      150           
  Lines       48773    48773           
=======================================
  Hits        44687    44687           
  Misses       4086     4086
Flag Coverage Δ
#multiple 89.99% <ø> (ø) ⬆️
#single 41.74% <ø> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5052842...9d749d8. Read the comment docs.

@jreback jreback added Period Period data type Clean labels Feb 6, 2018
int get_yq(npy_int64 ordinal, int freq, int *quarter, int *year);
int _quarter_year(npy_int64 ordinal, int freq, int *year, int *quarter);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems an odd naming convention

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah, _quarter_year is actually really redundant with get_yq, so they will be merged in the PR after next.

# period accessors

ctypedef int (*accessor)(int64_t ordinal, int freq) except INT32_MIN


cdef int pyear(int64_t ordinal, int freq):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these can be inline?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_accessor_func returns pointers to these functions. Does that affect inline-ability?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no idea. ok can you run a perf comparison on period stuff, maybe even vs 0.21.0 so we can get a good sense if anything has changed much.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compiler didn't complain about the inlining. Looking at timeit results the difference is indistinguishable. Will get asv going in a few minutes.

return DtoB_weekday(absdate);
}
static npy_int64 DtoB(struct date_info *dinfo, int roll_back) {
int day_of_week = dayofweek(dinfo->year, dinfo->month, dinfo->day);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assuming you are planning on removing these to cython as we likely already have routines that do exactly this (next PR)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DtoB actually stays in place at least through the next two PRs in this sequence. This PR is getting dayofweek from the implementation in src/datetime/np_datetime, so did remove the duplicated routine in period_helper.

@jbrockmendel
Copy link
Member Author

No change against master (running against ~0.21.0 now)

asv continuous -f 1.1 -E virtualenv master HEAD -b period
[...]
    before     after       ratio
  [93c86aa1] [9e88f869]
+   60.09μs    68.30μs      1.14  period.PeriodUnaryMethods.time_to_timestamp('M')
+   16.56ms    18.38ms      1.11  period.DataFramePeriodColumn.time_setitem_period_column
-   19.84μs    17.81μs      0.90  period.PeriodUnaryMethods.time_asfreq('M')
-  971.81ns   868.77ns      0.89  period.PeriodProperties.time_property('M', 'daysinmonth')
-   21.83μs    18.90μs      0.87  period.PeriodUnaryMethods.time_now('M')

  [93c86aa1] [9e88f869]
+  851.04ns   968.69ns      1.14  period.PeriodProperties.time_property('M', 'quarter')
-    1.03μs   874.42ns      0.85  period.PeriodProperties.time_property('M', 'dayofyear')

  [93c86aa1] [9e88f869]
+  110.80μs   166.18μs      1.50  period.PeriodIndexConstructor.time_from_date_range('D')

  [93c86aa1] [9e88f869]
+  151.88μs   244.22μs      1.61  period.Indexing.time_intersection
-    1.01μs   842.80ns      0.83  period.PeriodProperties.time_property('min', 'year')

@jbrockmendel
Copy link
Member Author

Against ~0.21

asv continuous -f 1.1 -E virtualenv 99d384d4f7ef6e2b6cb5a4eebea127e8bddf521b HEAD -b period
[...]
    before     after       ratio
  [99d384d4] [9e88f869]
-    3.42ms     2.82ms      0.83  period.PeriodIndexConstructor.time_from_pydatetime('D')
-   21.24ms    16.26ms      0.77  period.DataFramePeriodColumn.time_setitem_period_column
-  263.50μs   150.22μs      0.57  period.Indexing.time_intersection
-   13.36μs     7.34μs      0.55  period.Indexing.time_get_loc

    before     after       ratio
  [99d384d4] [9e88f869]
+  875.00ns   979.92ns      1.12  period.PeriodProperties.time_property('min', 'quarter')
-   21.69μs    19.19μs      0.88  period.PeriodUnaryMethods.time_now('M')
-    3.38ms     2.80ms      0.83  period.PeriodIndexConstructor.time_from_pydatetime('D')
-   21.53ms    16.57ms      0.77  period.DataFramePeriodColumn.time_setitem_period_column

@jreback
Copy link
Contributor

jreback commented Feb 7, 2018

rebase as we merged the get_day function. lgtm otherwise.

@jbrockmendel
Copy link
Member Author

Once this goes in I can rebase #19550, then after that we'll get rid of date_info and just use pandas_datetimestruct directly.


@cython.cdivision
cdef char* c_strftime(date_info *dinfo, char *fmt):
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as usual want to write doc-strings for things

@jreback jreback added this to the 0.23.0 milestone Feb 8, 2018
@jreback jreback merged commit 31973f5 into pandas-dev:master Feb 8, 2018
@jreback
Copy link
Contributor

jreback commented Feb 8, 2018

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Period Period data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants