-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MAINT: Refactor logic in maybe_convert_* #15018
Conversation
you need to pass the structure to the functions themselves but i think a class is better here actually |
e4fa8f4
to
b74baac
Compare
@jreback : Good catch regarding classes. Fixed. |
@@ -629,23 +676,31 @@ cdef int64_t iINT64_MIN = <int64_t> INT64_MIN | |||
cdef uint64_t iUINT64_MAX = <uint64_t> UINT64_MAX | |||
|
|||
|
|||
cdef inline bint _check_uint64_nan(bint seen_uint, bint seen_null, | |||
bint coerce_numeric) except -1: | |||
cdef inline bint _check_uint64_nan(Seen seen) except -1: | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this should be a method of Seen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -654,22 +709,30 @@ cdef inline bint _check_uint64_nan(bint seen_uint, bint seen_null, | |||
return False | |||
|
|||
|
|||
cdef inline bint _check_uint64_int64_conflict(bint seen_sint, bint seen_uint, | |||
bint coerce_numeric) except -1: | |||
cdef inline bint _check_uint64_int64_conflict(Seen seen) except -1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this too
coerce_numeric): | ||
return values | ||
|
||
seen.seen_null = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make each of these cases where you are currently setting a method of Seen instead
e.g.
def saw_float(self):
self.seen_nul = True
self.seen_float = True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
_check_uint64_int64_conflict(seen_sint, seen_uint, | ||
coerce_numeric)): | ||
return values | ||
seen.seen_int = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def saw_int(self, int val):
self.seen_int = True
...
this way the loop is pretty simple then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
""" | ||
|
||
cdef public: | ||
bint seen_int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also change these variables to something like
_bool
_null
_uint
makes reading much easier w/o all the extra seen (alternatively sbool, snull, suint, sint, sfloat
.....)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
b74baac
to
0bd265c
Compare
Current coverage is 84.77% (diff: 100%)@@ master #15018 diff @@
==========================================
Files 145 145
Lines 51129 51129
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
Hits 43344 43344
Misses 7785 7785
Partials 0 0
|
@jreback : Addressed all comments, and everything is green still. |
|
||
return False | ||
|
||
cdef inline saw_null(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, and maybe you have a better refence. that it doesn't matter if you cdef a method in a cdef class, by-definiiion they are always cdef. e.g. in Timestamp
we mark these def
actually.
I also am not sure of the inline
when in a cdef class (IOW it may or may not make any difference).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if it is necessary either, but I do see cdef
(cdef inline
in the codebase) for cdef
class methods (e.g. Textreader
). I don't think it could hurt to be explicit.
Adding inline
is a performance thing, and couldn't hurt here since we aren't doing anything complicated in this function.
|
||
return False | ||
|
||
cdef inline bint check_uint64_int64_conflict(self) except -1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you are now calling both of these routines check_uint64.....
always together with an or?
if so, any reason to combine them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes perfect sense. Done.
0bd265c
to
2f696a5
Compare
@jreback : Made the requested changes, and everything is green. |
|
||
if not seen.sobject: | ||
if not safe: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dont' you need:
if seen.check_uint64_conflict():
somewhere here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No actually because when we detect conflicts, seen.sobject
is set to 1, so that whole branch of logic is skipped.
Small question (and sorry that you already changed all those once before), but can you name it |
@jorisvandenbossche : Fair enough (what do they mean, just curious?). With underscores (as @jreback suggested) or none? I'm inclined to go with no underscores since not declaring them public makes it pretty clear I think that they are private but want to double check. |
underscores are not needed here I think (if we add an underscore, I would make the class (sint -> saint, snul(l) -> nitwit/squib (don't know the exact translation)) |
@jorisvandenbossche : Very well. Can rename without underscores. |
2f696a5
to
c62bb1d
Compare
i don't think u can name a struct member the same as its type e.g. int int is not valid code (python or c) |
i see the renaming - that is ok lgtm then |
break | ||
|
||
seen_numeric = seen_complex or seen_float or seen_int | ||
seen.numeric_ = seen.complex_ or seen.float_ or seen.int_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seen.numeric_
should be a property in Seen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
if not seen_bool and not seen_datetime and not seen_timedelta: | ||
if seen_complex: | ||
if seen.null_: | ||
if (not seen.bool_ and not seen.datetime_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be less code here if you would make this compound if as a property to Seen, though only if it has an intutie name, maybe is_float_or_complex
(you might also need to pass safe
to the Seen constructor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
if seen_complex: | ||
if not seen_int: | ||
if seen.null_: | ||
if (not seen.bool_ and not seen.datetime_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could use the same is_float_or_complex
here as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@gfyoung my last 2 comments are meant, not really to simplify the code much, but more to make it a bit easier to read w/o having to think thru all of the possible cases. |
Creates a Seen class to abstract objects encountered when doing type conversions.
c62bb1d
to
531a361
Compare
@jreback : Certainly. There was another similar refactoring that I did for |
lgtm. just give a quick perf test (locally) and ping on green. |
@jreback : No major perf impact, and everything is green. Ready to merge if there are no other concerns. |
thanks! |
Creates a Seen class to abstract objects encountered when doing type conversions.
Follow-up to #15005.