Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] ColumnTransformer error when you pass pandas.Index as columns #4435

Closed
codeananda opened this issue Dec 9, 2021 · 2 comments
Closed
Labels
? - Needs Triage Need team to review and classify bug Something isn't working

Comments

@codeananda
Copy link

Minimal code to reproduce:

import numpy as np
import cudf
from cuml.experimental.preprocessing import ColumnTransformer
from cuml.preprocessing import RobustScaler

data = np.arange(12).reshape(3, 4)
X = cudf.DataFrame(data=data, columns=['a', 'b', 'c', 'd'])

col_index= X.columns[:2]
col_index_list = list(col_index)

ct = ColumnTransformer([
    ('scaler', RobustScaler(), col_index)
])

ct.fit_transform(X)

Note: swap col_index for col_index_list and it works fine.

Error message

---------------------------------------------------------------------------
Empty                                     Traceback (most recent call last)
/opt/conda/envs/rapids/lib/python3.7/site-packages/joblib/parallel.py in dispatch_one_batch(self, iterator)
    819             try:
--> 820                 tasks = self._ready_batches.get(block=False)
    821             except queue.Empty:

/opt/conda/envs/rapids/lib/python3.7/queue.py in get(self, block, timeout)
    166                 if not self._qsize():
--> 167                     raise Empty
    168             elif timeout is None:

Empty: 

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-61-c058d4fbb573> in <module>
      9 ])
     10 
---> 11 ct.fit_transform(X)

/opt/conda/envs/rapids/lib/python3.7/site-packages/cuml/internals/api_decorators.py in inner_get(*args, **kwargs)
    584 
    585                 # Call the function
--> 586                 ret_val = func(*args, **kwargs)
    587 
    588             return cm.process_return(ret_val)

/opt/conda/envs/rapids/lib/python3.7/site-packages/cuml/_thirdparty/sklearn/preprocessing/_column_transformer.py in fit_transform(self, X, y)
    876         self._validate_remainder(X)
    877 
--> 878         result = self._fit_transform(X, y, _fit_transform_one)
    879 
    880         if not result:

/opt/conda/envs/rapids/lib/python3.7/site-packages/cuml/_thirdparty/sklearn/preprocessing/_column_transformer.py in _fit_transform(self, X, y, func, fitted)
    813                     message=self._log_message(name, idx, len(transformers)))
    814                 for idx, (name, trans, column, weight) in enumerate(
--> 815                         self._iter(fitted=fitted, replace_strings=True), 1))
    816         except ValueError as e:
    817             if "Expected 2D array, got 1D array instead" in str(e):

/opt/conda/envs/rapids/lib/python3.7/site-packages/joblib/parallel.py in __call__(self, iterable)
   1039             # remaining jobs.
   1040             self._iterating = False
-> 1041             if self.dispatch_one_batch(iterator):
   1042                 self._iterating = self._original_iterator is not None
   1043 

/opt/conda/envs/rapids/lib/python3.7/site-packages/joblib/parallel.py in dispatch_one_batch(self, iterator)
    829                 big_batch_size = batch_size * n_jobs
    830 
--> 831                 islice = list(itertools.islice(iterator, big_batch_size))
    832                 if len(islice) == 0:
    833                     return False

/opt/conda/envs/rapids/lib/python3.7/site-packages/cuml/_thirdparty/sklearn/preprocessing/_column_transformer.py in <genexpr>(.0)
    812                     message_clsname='ColumnTransformer',
    813                     message=self._log_message(name, idx, len(transformers)))
--> 814                 for idx, (name, trans, column, weight) in enumerate(
    815                         self._iter(fitted=fitted, replace_strings=True), 1))
    816         except ValueError as e:

/opt/conda/envs/rapids/lib/python3.7/site-packages/cuml/_thirdparty/sklearn/preprocessing/_column_transformer.py in _safe_indexing(X, indices, axis)
    238 
    239     if hasattr(X, "iloc"):
--> 240         return _pandas_indexing(X, indices, indices_dtype, axis=axis)
    241     elif hasattr(X, "shape"):
    242         return _array_indexing(X, indices, indices_dtype, axis=axis)

/opt/conda/envs/rapids/lib/python3.7/site-packages/cuml/_thirdparty/sklearn/preprocessing/_column_transformer.py in _pandas_indexing(X, key, key_dtype, axis)
    263         # Work-around for indexing with read-only key in pandas
    264         # FIXME: solved in pandas 0.25
--> 265         key = np.asarray(key)
    266         key = key if key.flags.writeable else key.copy()
    267     elif isinstance(key, tuple):

/opt/conda/envs/rapids/lib/python3.7/site-packages/cupy/_creation/from_data.py in asarray(a, dtype, order)
     64 
     65     """
---> 66     return _core.array(a, dtype, False, order)
     67 
     68 

cupy/_core/core.pyx in cupy._core.core.array()

cupy/_core/core.pyx in cupy._core.core.array()

cupy/_core/core.pyx in cupy._core.core._send_object_to_gpu()

ValueError: Unsupported dtype object

Using cuml 21.08.02 installed on Paperspace Gradient via this container:
rapidsai/rapidsai-core:21.08-cuda11.0-runtime-ubuntu18.04-py3.7

@codeananda codeananda added ? - Needs Triage Need team to review and classify bug Something isn't working labels Dec 9, 2021
@github-actions
Copy link

github-actions bot commented Jan 8, 2022

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@viclafargue
Copy link
Contributor

Thanks for opening the issue. #4481 should solve it.

rapids-bot bot pushed a commit that referenced this issue Jan 18, 2022
vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this issue Oct 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage Need team to review and classify bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants