-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] get_dummies fails in dask-cudf due to dask categorical type checking #7111
Comments
Looks like there's a lot of enhancement going on for cuDF.dtype in #6160 and some of it touches EDIT: It looks like this might be handled already, by CategoricalDtype inheriting from Generic which inherits from pandas' ExtensionDtype. |
Hey @beckernick , I put some thoughts in the other dask thread. |
Pardon the drive-by comment, but while you are fixing this bug, consider renaming the function to |
This issue has been marked stale due to no recent activity in the past 30d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be marked rotten if there is no activity in the next 60d. |
Collecting from the related Dask issue dask/dask#7054 , it seems like there are several items of interest.
|
This issue has been labeled |
I have been looking into this a bit, and from what I can tell, the change to fix this could be handled in any of Dask, Pandas, or cuDF (which is part of what I think makes it hard to figure out where to even discuss this). The issue is that dask wants to use
With that in mind then, I think these are the possible solutions:
Does anyone see any other solutions, or is anyone preferential to any one of the ones listed here? |
This should be the least preferred option. A
This sounds like the right approach perhaps, where cuDF's |
#7740) Fixes #7111 Closes #7400 Authors: - @brandon-b-miller - Keith Kraus (@kkraus14) Approvers: - Keith Kraus (@kkraus14) - GALI PREM SAGAR (@galipremsagar) - Ray Douglass (@raydouglass) URL: #7740
Calling
dd.get_dummies
fails with dask-cudf due to Dask's reliance on thepd.api.types.is_categorical_dtype
check from pandas. Our categorical columns do not return True for this check. Instead, we can usecudf.utils.dtype.is_categorical_dtype
.This issue is for tracking purposes. We'll (probably) want to abstract this from pandas in upstream Dask.
The text was updated successfully, but these errors were encountered: