-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Misc Python/Cython optimizations #7686
Changes from 33 commits
935648b
806a3ef
40a7b17
a1c576e
788d9d6
6a64285
e7d0981
c39932c
4ff09fc
9433582
74f2884
0178127
efea63d
c3b6444
01b2cf5
8899258
3507785
7f8e1cd
f2e4609
72598fb
fa220b6
3760077
de9ca28
e35d03b
e2fd533
64ca702
749edf1
739ec57
498b70e
3cd012b
c28866c
01e13fa
89a0301
5e73de7
a4fe7b4
eadcc9c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -144,16 +144,13 @@ def numeric_normalize_types(*args): | |
|
||
|
||
def is_numerical_dtype(obj): | ||
shwina marked this conversation as resolved.
Show resolved
Hide resolved
|
||
if is_categorical_dtype(obj): | ||
# TODO: we should handle objects with a `.dtype` attribute, | ||
# e.g., arrays, here. | ||
try: | ||
dtype = np.dtype(obj) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What if someone gives us a Pandas nullable integer type? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We certainly aren't handling this currently. On branch-0.19 >>> cudf.Series([1, 2, 3], dtype=pd.Int64Dtype()) # TypeError
>>> cudf.utils.dtypes.is_numerical_dtype(pd.Int64Dtype()) # TypeError I agree we shouldl support this. But how to do so in an efficient way is a difficult question. @vyasr and I were talking about this a couple of days ago, and he has some ideas for how to make dtype introspection faster/cheaper. We can perhaps take on this problem there? |
||
except TypeError: | ||
return False | ||
if is_list_dtype(obj): | ||
return False | ||
return ( | ||
np.issubdtype(obj, np.bool_) | ||
or np.issubdtype(obj, np.floating) | ||
or np.issubdtype(obj, np.signedinteger) | ||
or np.issubdtype(obj, np.unsignedinteger) | ||
) | ||
return dtype.kind in "biuf" | ||
|
||
|
||
def is_string_dtype(obj): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Numerical types being the most common [[citation needed]], and
is_numerical_dtype
now being quite fast, it makes sense to do this check first.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still definitely pro doing this, but I'm working on prototyping the (amortized) constant-time approach I suggested and I'll update you once that's done. Hopefully that will make ordering concerns here largely moot.