Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More tslibs TODOS: small optimizations, trim namespaces #18446

Closed
wants to merge 6 commits into from

Conversation

jbrockmendel
Copy link
Member

A few methods of Timedelta and Timestamp that don't need to be user-facing are changed from cpdef to cdef. To make the wheels turn, Timestamp methods that call those methods are moved from Timestamp to _Timestamp.

Stronger typing for _Timestamp. _get_start_end_field and a couple others.

@cython.final supposedly allows for a small optimization because cython does not need to check at runtime whether a method has been overridden. Applied to Timestamp.__add__, Timestamp.__sub__

@jreback jreback added Clean Datetime Datetime data dtype labels Nov 23, 2017
@@ -246,6 +249,7 @@ cdef class _Timestamp(datetime):
result.nanosecond = self.nanosecond
return result

@cython.final
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you perf check this (you can just compare time-its for couple of ops in PR and master to see); if it makes a diff then add an asv (and if not let's not do this).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it makes a diff then add an asv

I'm shocked to see there isn't already an asv for "Timestamp addition/subtraction". I'll make one, or see if one isn't hidden somewhere other than benchmarks/timestamp.py. If it is somewhere else, will moving them around cause a problem for backward-comparability?

can you perf check this

Not seeing much difference, but %timeit results are jumping around more than I'd like. I've got a couple of questions queued up for the cython mailing list, will add this to them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm shocked to see there isn't already an asv for "Timestamp addition/subtraction". I'll make one, or see if one isn't hidden somewhere other than benchmarks/timestamp.py. If it is somewhere else, will moving them around cause a problem for backward-comparability?

no there are, its just asv's seem to be unstable for you :> (and this is prob a tiny difference)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no there are

Unless the instability extends to the contents of the benchmarks/ files, I'm pretty sure asv_bench/benchmarks/timestamp.py doesn't have any addition/subtraction benchmarks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timeseries.py has plenty

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah, so the question about moving was if its OK to consolidate Timestamp asvs (kinda analogous to what I've been doing for the tests)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes sure. we definitly want compreshensive benchmarks for scalar + scalar (and scalar + index, etc). so consolidation is fine.

@jbrockmendel
Copy link
Member Author

Asked on the cython mailing list: turns out @cython.final doesn't apply to special methods including __add__ and __sub__. Will remove.

@jbrockmendel
Copy link
Member Author

Looks like because of cython implementation details, making _round cdef (and therefore not in the namespace) requires setting default values for freq in round, ceil, floor methods. Any objection to making those default to "D"?

@jreback
Copy link
Contributor

jreback commented Nov 24, 2017

Looks like because of cython implementation details, making _round cdef (and therefore not in the namespace) requires setting default values for freq in round, ceil, floor methods. Any objection to making those default to "D"?

don't set defaults, this is a required user-settable parameter. move them back to the class impl. not really sure these belong in the c class def anyhow (no benefit).

@jbrockmendel
Copy link
Member Author

move them back to the class impl. not really sure these belong in the c class def anyhow (no benefit).

Sure. The idea was to get _round out of the user-facing namespace. Not that big a deal. Will change shortly.

@codecov
Copy link

codecov bot commented Nov 24, 2017

Codecov Report

Merging #18446 into master will increase coverage by 0.17%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18446      +/-   ##
==========================================
+ Coverage   91.35%   91.53%   +0.17%     
==========================================
  Files         163      163              
  Lines       49691    50730    +1039     
==========================================
+ Hits        45397    46437    +1040     
+ Misses       4294     4293       -1
Flag Coverage Δ
#multiple 89.38% <ø> (+0.24%) ⬆️
#single 39.66% <ø> (-0.08%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.8% <0%> (-0.1%) ⬇️
pandas/core/indexes/multi.py 96.4% <0%> (-0.01%) ⬇️
pandas/core/api.py 100% <0%> (ø) ⬆️
pandas/io/pytables.py 92.84% <0%> (ø) ⬆️
pandas/core/reshape/api.py 100% <0%> (ø) ⬆️
pandas/core/sparse/frame.py 94.78% <0%> (ø) ⬆️
pandas/io/parsers.py 95.59% <0%> (ø) ⬆️
pandas/core/reshape/melt.py 96.22% <0%> (+0.03%) ⬆️
pandas/core/indexes/interval.py 92.64% <0%> (+0.11%) ⬆️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fedc503...e4d5f7f. Read the comment docs.

@codecov
Copy link

codecov bot commented Nov 24, 2017

Codecov Report

Merging #18446 into master will increase coverage by 0.17%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18446      +/-   ##
==========================================
+ Coverage   91.35%   91.53%   +0.17%     
==========================================
  Files         163      163              
  Lines       49691    50730    +1039     
==========================================
+ Hits        45397    46437    +1040     
+ Misses       4294     4293       -1
Flag Coverage Δ
#multiple 89.38% <ø> (+0.24%) ⬆️
#single 39.66% <ø> (-0.08%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.8% <0%> (-0.1%) ⬇️
pandas/core/indexes/multi.py 96.4% <0%> (-0.01%) ⬇️
pandas/io/pytables.py 92.84% <0%> (ø) ⬆️
pandas/io/parsers.py 95.59% <0%> (ø) ⬆️
pandas/core/reshape/api.py 100% <0%> (ø) ⬆️
pandas/core/api.py 100% <0%> (ø) ⬆️
pandas/core/sparse/frame.py 94.78% <0%> (ø) ⬆️
pandas/core/reshape/melt.py 96.22% <0%> (+0.03%) ⬆️
pandas/core/indexes/interval.py 92.64% <0%> (+0.11%) ⬆️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fedc503...e4d5f7f. Read the comment docs.

def days_in_month(self):
return self._get_field('dim')

cdef bint _get_start_end_field(_Timestamp self, field):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does moving this to _Timestamp help

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does moving this to _Timestamp help

Among the todos was getting _get_start_end_field and _get_field out of the user-facing namespace. This accomplishes that.

It also adds type declarations, so should be marginally more performant.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we care about _get_start_end_field in the user namespace, its a private method?

type declarations are fine (pls run an asv to confirm)

def days_in_month(self):
return self._get_field('dim')

cdef bint _get_start_end_field(_Timestamp self, field):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we care about _get_start_end_field in the user namespace, its a private method?

type declarations are fine (pls run an asv to confirm)

@jbrockmendel
Copy link
Member Author

why do we care about _get_start_end_field in the user namespace, its a private method?

Not sure anymore. I probably put it on the todo list in my youth. Will mark it as fine-as-is.

type declarations are fine (pls run an asv to confirm)

OK. I'll close this for now and circle back; avoid clogging the CI queue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Datetime Datetime data dtype
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants