-
Notifications
You must be signed in to change notification settings - Fork 928
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Incorrect dtype when iterating over dtypes in cudf.pandas #17165
Comments
I think the problem is due to our custom function for |
Okay removing the custom iterator made your minimum repro work, but It could break other things (we'll see).
|
Fixes: #17165 Fixes: #14481 This PR properly wraps the result of custom iterator. ```python In [2]: import pandas as pd In [3]: s = pd.Series([10, 1, 2, 3, 4, 5]*1000000) # Without custom_iter: In [4]: %timeit for i in s: True 6.34 s ± 25.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) # This PR: In [4]: %timeit for i in s: True 6.16 s ± 17.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) # On `branch-24.12`: 1.53 s ± 6.27 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` I think `custom_iter` has to exist. Here is why, invoking any sort of `iteration` on GPU objects will raise errors and thus in the end we fall-back to CPU. Instead of trying to move the objects from host to device memory (if the object is on host memory only), we will avoid a CPU-to-GPU transfer. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Matthew Murray (https://github.com/Matt711) URL: #17251
Describe the bug
When using cudf.pandas and iterating over the dtypes of a dataframe, categorical dtype objects are reported as
cudf.CategoricalDtype
and notpandas.CategoricalDtype
, causingisinstance
checks to fail unexpectedly.Steps/Code to reproduce bug
Run the following using
python -m cudf.pandas
and compare to output withoutcudf.pandas
Expected behavior
Output should be the same for the
isinstance
checks with and withoutcudf.pandas
and regardless of whether or not we are iterating over dtypes or selecting them by index.Environment details (please complete the following information):
conda list
Output:Additional context
This prevents training an XGBoost model on categorical variables using
cudf.pandas
if the.plot
method of aSeries
has been called beforehand. See #17166 for information on unexpected behavior from.plot
.The text was updated successfully, but these errors were encountered: