-
Notifications
You must be signed in to change notification settings - Fork 922
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Combine categories on merge #3496
Comments
Able to repro on latest 0.12, investigating. |
@brandon-b-miller I think your join implicit typecasting PR may take care of this? If not I think we should just do the minimal amount of work to throw an exception if the categories aren't synchronized for the time being and then wait for libcudf support for categorical columns. |
@kkraus14 It doesn't yet, but shouldn't be hard to tweak so that it raises in this case, I agree it doesn't really make sense to merge two categorical columns that specify different categories. |
I actually think it does make sense to merge them, it's just that with categorical columns being implemented in libcudf we shouldn't spend the effort which will just get ripped out in the near future anyway. |
I think there's only a way of making sense of it in strictly the unordered case. In the ordered case we'd have to somehow determine the ordering for the superset containing all the possible categories. |
Yea, good point. Pandas implicitly converts categoricals to |
@brandon-b-miller is this still an issue after the porting? |
This should raise for the any code that includes the implicit typecasting.
We should be seeing that instead of incorrect results. |
Error would be definitely better than current expected results. Yet I think it make sense to keep this issue as FR to support categorical variables (when their categories does not match). Join might be actually faster, especially when there are very few categories and merging categories would be cheap. |
issue updated to reflect the feature request |
I would expect to get same output for string and categorical columns used in join
The text was updated successfully, but these errors were encountered: