Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT: Refactor logic in maybe_convert_* #15018

Closed

Conversation

gfyoung
Copy link
Member

@gfyoung gfyoung commented Dec 30, 2016

Creates a Seen class to abstract objects encountered when doing type conversions.

Follow-up to #15005.

@jreback
Copy link
Contributor

jreback commented Dec 30, 2016

you need to pass the structure to the functions themselves

but i think a class is better here actually

@gfyoung gfyoung force-pushed the maybe-convert-refactor branch from e4fa8f4 to b74baac Compare December 31, 2016 01:30
@gfyoung
Copy link
Member Author

gfyoung commented Dec 31, 2016

@jreback : Good catch regarding classes. Fixed.

@@ -629,23 +676,31 @@ cdef int64_t iINT64_MIN = <int64_t> INT64_MIN
cdef uint64_t iUINT64_MAX = <uint64_t> UINT64_MAX


cdef inline bint _check_uint64_nan(bint seen_uint, bint seen_null,
bint coerce_numeric) except -1:
cdef inline bint _check_uint64_nan(Seen seen) except -1:
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this should be a method of Seen

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -654,22 +709,30 @@ cdef inline bint _check_uint64_nan(bint seen_uint, bint seen_null,
return False


cdef inline bint _check_uint64_int64_conflict(bint seen_sint, bint seen_uint,
bint coerce_numeric) except -1:
cdef inline bint _check_uint64_int64_conflict(Seen seen) except -1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this too

coerce_numeric):
return values

seen.seen_null = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make each of these cases where you are currently setting a method of Seen instead

e.g.

def saw_float(self):
    self.seen_nul = True
    self.seen_float = True

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

_check_uint64_int64_conflict(seen_sint, seen_uint,
coerce_numeric)):
return values
seen.seen_int = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def saw_int(self, int val):
    self.seen_int = True
...

this way the loop is pretty simple then.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@jreback jreback added Clean Dtype Conversions Unexpected or buggy dtype conversions labels Dec 31, 2016
"""

cdef public:
bint seen_int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also change these variables to something like

_bool
_null
_uint

makes reading much easier w/o all the extra seen (alternatively sbool, snull, suint, sint, sfloat.....)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@gfyoung gfyoung force-pushed the maybe-convert-refactor branch from b74baac to 0bd265c Compare December 31, 2016 05:56
@codecov-io
Copy link

codecov-io commented Dec 31, 2016

Current coverage is 84.77% (diff: 100%)

Merging #15018 into master will not change coverage

@@             master     #15018   diff @@
==========================================
  Files           145        145          
  Lines         51129      51129          
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
  Hits          43344      43344          
  Misses         7785       7785          
  Partials          0          0          

Powered by Codecov. Last update 9947a99...531a361

@gfyoung
Copy link
Member Author

gfyoung commented Dec 31, 2016

@jreback : Addressed all comments, and everything is green still.


return False

cdef inline saw_null(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, and maybe you have a better refence. that it doesn't matter if you cdef a method in a cdef class, by-definiiion they are always cdef. e.g. in Timestamp we mark these def actually.

I also am not sure of the inline when in a cdef class (IOW it may or may not make any difference).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if it is necessary either, but I do see cdef (cdef inline in the codebase) for cdef class methods (e.g. Textreader). I don't think it could hurt to be explicit.

Adding inline is a performance thing, and couldn't hurt here since we aren't doing anything complicated in this function.


return False

cdef inline bint check_uint64_int64_conflict(self) except -1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you are now calling both of these routines check_uint64..... always together with an or?

if so, any reason to combine them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes perfect sense. Done.

@gfyoung gfyoung force-pushed the maybe-convert-refactor branch from 0bd265c to 2f696a5 Compare December 31, 2016 19:22
@gfyoung
Copy link
Member Author

gfyoung commented Dec 31, 2016

@jreback : Made the requested changes, and everything is green.


if not seen.sobject:
if not safe:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont' you need:

if seen.check_uint64_conflict(): somewhere here?

Copy link
Member Author

@gfyoung gfyoung Jan 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No actually because when we detect conflicts, seen.sobject is set to 1, so that whole branch of logic is skipped.

@jreback jreback added this to the 0.20.0 milestone Jan 1, 2017
@jorisvandenbossche
Copy link
Member

Small question (and sorry that you already changed all those once before), but can you name it seen.int, seen.null, etc instead of see.sint, seen.snull, .. ?
IMO that reads easier (and sint and snull actually mean other things, or at least in Dutch :-))

@gfyoung
Copy link
Member Author

gfyoung commented Jan 2, 2017

@jorisvandenbossche : Fair enough (what do they mean, just curious?). With underscores (as @jreback suggested) or none? I'm inclined to go with no underscores since not declaring them public makes it pretty clear I think that they are private but want to double check.

@jorisvandenbossche
Copy link
Member

underscores are not needed here I think (if we add an underscore, I would make the class _Seen (the instantation can still be seen)

(sint -> saint, snul(l) -> nitwit/squib (don't know the exact translation))

@gfyoung
Copy link
Member Author

gfyoung commented Jan 2, 2017

@jorisvandenbossche : Very well. Can rename without underscores.

@gfyoung gfyoung force-pushed the maybe-convert-refactor branch from 2f696a5 to c62bb1d Compare January 2, 2017 11:27
@jreback
Copy link
Contributor

jreback commented Jan 2, 2017

i don't think u can name a struct member the same as its type e.g.

int int

is not valid code (python or c)

@jreback
Copy link
Contributor

jreback commented Jan 2, 2017

i see the renaming - that is ok

lgtm then

break

seen_numeric = seen_complex or seen_float or seen_int
seen.numeric_ = seen.complex_ or seen.float_ or seen.int_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seen.numeric_ should be a property in Seen

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

if not seen_bool and not seen_datetime and not seen_timedelta:
if seen_complex:
if seen.null_:
if (not seen.bool_ and not seen.datetime_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be less code here if you would make this compound if as a property to Seen, though only if it has an intutie name, maybe is_float_or_complex (you might also need to pass safe to the Seen constructor

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

if seen_complex:
if not seen_int:
if seen.null_:
if (not seen.bool_ and not seen.datetime_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could use the same is_float_or_complex here as well

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@jreback
Copy link
Contributor

jreback commented Jan 2, 2017

@gfyoung my last 2 comments are meant, not really to simplify the code much, but more to make it a bit easier to read w/o having to think thru all of the possible cases.

Creates a Seen class to abstract objects
encountered when doing type conversions.
@gfyoung gfyoung force-pushed the maybe-convert-refactor branch from c62bb1d to 531a361 Compare January 2, 2017 19:37
@gfyoung
Copy link
Member Author

gfyoung commented Jan 2, 2017

@jreback : Certainly. There was another similar refactoring that I did for is_bool too.

@jreback
Copy link
Contributor

jreback commented Jan 2, 2017

lgtm. just give a quick perf test (locally) and ping on green.

@gfyoung
Copy link
Member Author

gfyoung commented Jan 3, 2017

@jreback : No major perf impact, and everything is green. Ready to merge if there are no other concerns.

@jreback jreback closed this in 136a6fb Jan 3, 2017
@jreback
Copy link
Contributor

jreback commented Jan 3, 2017

thanks!

@gfyoung gfyoung deleted the maybe-convert-refactor branch January 3, 2017 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants