Skip to content

Commit

Permalink
Don't identify decimals as strings. (#7710)
Browse files Browse the repository at this point in the history
As documented in [this pandas issue](pandas-dev/pandas#15585), `is_string_type` for pandas is not strict and will characterize a whole bunch of things as strings that aren't. For our purposes, this is problematic because basically all subclasses of `ExtensionDType` will be classified as strings by that function. This is definitely not appropriate, so I modified our version of `is_string_dtype` to explicitly reject all of our extension dtypes (previously it was only excluding categorical types). I'm not 100% confident that no other parts of the code base rely on the current (erroneous) behavior, but the cudf tests all passed for me locally and my attempt to trace all calls of `utils.is_string_dtype` all look to be places where the change gives more correct behavior, so I think our best bet is to just move forward with this change. Any problems that result from this change in the future due to other code relying on the current behavior should probably be characterized as bugs in the calling code and fixed there. The same goes for for external codes that relied on this behavior; this change is potentially breaking for them as well, but again is something that they should be addressing.

Authors:
  - Vyas Ramasubramani (@vyasr)

Approvers:
  - Keith Kraus (@kkraus14)

URL: #7710
  • Loading branch information
vyasr authored Mar 25, 2021
1 parent eb92145 commit 1a1bd66
Showing 1 changed file with 9 additions and 1 deletion.
10 changes: 9 additions & 1 deletion python/cudf/cudf/utils/dtypes.py
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,15 @@ def is_numerical_dtype(obj):


def is_string_dtype(obj):
return pd.api.types.is_string_dtype(obj) and not is_categorical_dtype(obj)
return (
pd.api.types.is_string_dtype(obj)
# Reject all cudf extension types.
and not is_categorical_dtype(obj)
and not is_decimal_dtype(obj)
and not is_list_dtype(obj)
and not is_struct_dtype(obj)
and not is_interval_dtype(obj)
)


def is_datetime_dtype(obj):
Expand Down

0 comments on commit 1a1bd66

Please sign in to comment.