Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] cuML estimators should support non-consecutive labels outside of [0, n) where appropriate #4478

Closed
beckernick opened this issue Jan 12, 2022 · 9 comments · Fixed by #4780
Labels
? - Needs Triage Need team to review and classify feature request New feature or request inactive-30d

Comments

@beckernick
Copy link
Member

As noted in rapidsai/cudf#10024 , cuML RandomForestClassifier will throw an error if the target column has non-consecutive labels outside of the [0, n) range. This does not occur in scikit-learn, perhaps due to label encoding happening under the hood.

This may occur with other estimators as well.

import pandas as pd
from sklearn.ensemble import RandomForestClassifier
import cumldf = pd.DataFrame({
    "x1": [0.0,1,2],
    "x2": [-3,2.0,5],
    "y": [-3, 0, 4.0]
})
clf = RandomForestClassifier()
print(clf.fit(df[["x1", "x2"]], df["y"]))
​
clf2 = cuml.ensemble.RandomForestClassifier()
print(clf2.fit(df[["x1", "x2"]], df["y"]))
RandomForestClassifier()
/home/nicholasb/conda/envs/rapids-22.02-snow/lib/python3.8/site-packages/cuml/internals/api_decorators.py:567: UserWarning: The number of bins, `n_bins` is greater than the number of samples used for training. Changing `n_bins` to number of training samples.
  ret_val = func(*args, **kwargs)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_8028/3186242508.py in <module>
     12 
     13 clf2 = cuml.ensemble.RandomForestClassifier()
---> 14 print(clf2.fit(df[["x1", "x2"]], df["y"]))

~/conda/envs/rapids-22.02-snow/lib/python3.8/contextlib.py in inner(*args, **kwds)
     73         def inner(*args, **kwds):
     74             with self._recreate_cm():
---> 75                 return func(*args, **kwds)
     76         return inner
     77 

~/conda/envs/rapids-22.02-snow/lib/python3.8/site-packages/cuml/internals/api_decorators.py in inner_with_setters(*args, **kwargs)
    407                                 target_val=target_val)
    408 
--> 409                 return func(*args, **kwargs)
    410 
    411         @wraps(func)

cuml/ensemble/randomforestclassifier.pyx in cuml.ensemble.randomforestclassifier.RandomForestClassifier.fit()

~/conda/envs/rapids-22.02-snow/lib/python3.8/site-packages/cuml/internals/api_decorators.py in inner_set(*args, **kwargs)
    565 
    566                 # Call the function
--> 567                 ret_val = func(*args, **kwargs)
    568 
    569             return cm.process_return(ret_val)

cuml/ensemble/randomforest_common.pyx in cuml.ensemble.randomforest_common.BaseRandomForestModel._dataset_setup_for_fit()

ValueError: The labels need to be consecutive values from 0 to the number of unique label values
conda list # packages in environment at /home/nicholasb/conda/envs/rapids-22.02-snow: # # Name Version Build Channel _libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 1_gnu conda-forge abseil-cpp 20210324.2 h9c3ff4c_0 conda-forge aiohttp 3.8.1 py38h497a2fe_0 conda-forge aiosignal 1.2.0 pyhd8ed1ab_0 conda-forge anyio 3.5.0 py38h578d9bd_0 conda-forge appdirs 1.4.4 pyh9f0ad1d_0 conda-forge argon2-cffi 21.3.0 pyhd8ed1ab_0 conda-forge argon2-cffi-bindings 21.2.0 py38h497a2fe_1 conda-forge arrow-cpp 5.0.0 py38h579a05f_22_cuda conda-forge arrow-cpp-proc 3.0.0 cuda conda-forge asn1crypto 1.4.0 pyh9f0ad1d_0 conda-forge async-timeout 4.0.2 pyhd8ed1ab_0 conda-forge async_generator 1.10 py_0 conda-forge attrs 21.4.0 pyhd8ed1ab_0 conda-forge aws-c-cal 0.5.11 h95a6274_0 conda-forge aws-c-common 0.6.2 h7f98852_0 conda-forge aws-c-event-stream 0.2.7 h3541f99_13 conda-forge aws-c-io 0.10.5 hfb6a706_0 conda-forge aws-checksums 0.1.11 ha31a3da_7 conda-forge aws-sdk-cpp 1.8.186 hb4091e7_3 conda-forge babel 2.9.1 pyh44b312d_0 conda-forge backcall 0.2.0 pyh9f0ad1d_0 conda-forge backports 1.0 py_2 conda-forge backports.functools_lru_cache 1.6.4 pyhd8ed1ab_0 conda-forge bleach 4.1.0 pyhd8ed1ab_0 conda-forge blosc 1.21.0 h9c3ff4c_0 conda-forge bokeh 2.4.0 py38h578d9bd_0 conda-forge boost 1.74.0 py38h2b96118_4 conda-forge boost-cpp 1.74.0 h312852a_4 conda-forge brotli 1.0.9 h7f98852_6 conda-forge brotli-bin 1.0.9 h7f98852_6 conda-forge brotlipy 0.7.0 py38h497a2fe_1003 conda-forge brunsli 0.1 h9c3ff4c_0 conda-forge bzip2 1.0.8 h7f98852_4 conda-forge c-ares 1.18.1 h7f98852_0 conda-forge c-blosc2 2.0.4 h5f21a17_1 conda-forge ca-certificates 2021.10.8 ha878542_0 conda-forge cachetools 5.0.0 pyhd8ed1ab_0 conda-forge cairo 1.16.0 h6cf1ce9_1008 conda-forge certifi 2021.10.8 py38h578d9bd_1 conda-forge cffi 1.15.0 py38h3931269_0 conda-forge cfitsio 3.470 hb418390_7 conda-forge charls 2.2.0 h9c3ff4c_0 conda-forge charset-normalizer 2.0.10 pyhd8ed1ab_0 conda-forge click 8.0.3 py38h578d9bd_1 conda-forge click-plugins 1.1.1 py_0 conda-forge cligj 0.7.2 pyhd8ed1ab_1 conda-forge cloudpickle 2.0.0 pyhd8ed1ab_0 conda-forge colorama 0.4.4 pyh9f0ad1d_0 conda-forge colorcet 3.0.0 pyhd8ed1ab_0 conda-forge cryptography 35.0.0 py38h3e25421_2 conda-forge cucim 22.02.00a220111 cuda_11_py38_gab8e6a4_31 rapidsai-nightly cuda-python 11.5.0 py38h3fd9d12_0 nvidia cudatoolkit 11.2.72 h2bc3f7f_0 nvidia cudf 22.02.00a220111 cuda_11_py38_g951f630dfe_250 rapidsai-nightly cudf_kafka 22.02.00a220111 py38_g951f630dfe_250 rapidsai-nightly cugraph 22.02.00a220111 cuda11_py38_g6883cc19_66 rapidsai-nightly cuml 22.02.00a220111 cuda11_py38_g416ce61a4_84 rapidsai-nightly cupy 9.6.0 py38h177b0fd_0 conda-forge curl 7.81.0 h2574ce0_0 conda-forge cusignal 22.02.00a220111 py38_g6a02566_9 rapidsai-nightly cuspatial 22.02.00a220110 py38_g55280e3_14 rapidsai-nightly custreamz 22.02.00a220111 py38_g951f630dfe_250 rapidsai-nightly cuxfilter 22.02.00a220111 py38_g7c4dc24_7 rapidsai-nightly cycler 0.11.0 pyhd8ed1ab_0 conda-forge cyrus-sasl 2.1.27 h230043b_5 conda-forge cytoolz 0.11.2 py38h497a2fe_1 conda-forge dask 2021.11.2 pyhd8ed1ab_0 conda-forge dask-core 2021.11.2 pyhd8ed1ab_0 conda-forge dask-cuda 22.02.00a220111 py38_45 rapidsai-nightly dask-cudf 22.02.00a220111 cuda_11_py38_g951f630dfe_250 rapidsai-nightly dask-snowflake 0.0.2 pyhd8ed1ab_0 conda-forge datashader 0.11.1 pyh9f0ad1d_0 conda-forge datashape 0.5.4 py_1 conda-forge debugpy 1.5.1 py38h709712a_0 conda-forge decorator 5.1.1 pyhd8ed1ab_0 conda-forge defusedxml 0.7.1 pyhd8ed1ab_0 conda-forge distributed 2021.11.2 py38h578d9bd_0 conda-forge dlpack 0.5 h9c3ff4c_0 conda-forge entrypoints 0.3 py38h32f6830_1002 conda-forge expat 2.4.2 h9c3ff4c_0 conda-forge faiss-proc 1.0.0 cuda conda-forge fastavro 1.4.9 py38h497a2fe_0 conda-forge fastrlock 0.8 py38h709712a_1 conda-forge fiona 1.8.20 py38hbb147eb_2 conda-forge flit-core 3.6.0 pyhd8ed1ab_0 conda-forge font-ttf-dejavu-sans-mono 2.37 hab24e00_0 conda-forge font-ttf-inconsolata 3.000 h77eed37_0 conda-forge font-ttf-source-code-pro 2.038 h77eed37_0 conda-forge font-ttf-ubuntu 0.83 hab24e00_0 conda-forge fontconfig 2.13.1 hba837de_1005 conda-forge fonts-conda-ecosystem 1 0 conda-forge fonts-conda-forge 1 0 conda-forge fonttools 4.28.5 py38h497a2fe_0 conda-forge freetype 2.10.4 h0708190_1 conda-forge freexl 1.0.6 h7f98852_0 conda-forge frozenlist 1.2.0 py38h497a2fe_1 conda-forge fsspec 2021.11.1 pyhd8ed1ab_0 conda-forge gdal 3.3.2 py38h81a01a0_3 conda-forge geopandas 0.9.0 pyhd8ed1ab_1 conda-forge geopandas-base 0.9.0 pyhd8ed1ab_1 conda-forge geos 3.9.1 h9c3ff4c_2 conda-forge geotiff 1.7.0 h08e826d_2 conda-forge gettext 0.19.8.1 h73d1719_1008 conda-forge gflags 2.2.2 he1b5a44_1004 conda-forge giflib 5.2.1 h36c2ea0_2 conda-forge glog 0.5.0 h48cff8f_0 conda-forge gmp 6.2.1 h58526e2_0 conda-forge greenlet 1.1.2 py38h709712a_1 conda-forge grpc-cpp 1.42.0 ha1441d3_1 conda-forge hdf4 4.2.15 h10796ff_3 conda-forge hdf5 1.12.1 nompi_h2750804_103 conda-forge heapdict 1.0.1 py_0 conda-forge icu 68.2 h9c3ff4c_0 conda-forge idna 3.1 pyhd3deb0d_0 conda-forge imagecodecs 2021.8.26 py38hb5ce8f7_1 conda-forge imageio 2.13.5 pyh239f2a4_0 conda-forge importlib-metadata 4.10.0 py38h578d9bd_0 conda-forge importlib_metadata 4.10.0 hd8ed1ab_0 conda-forge importlib_resources 5.4.0 pyhd8ed1ab_0 conda-forge ipykernel 6.6.1 py38he5a9106_0 conda-forge ipython 7.31.0 py38h578d9bd_0 conda-forge ipython_genutils 0.2.0 py_1 conda-forge ipywidgets 7.6.5 pyhd8ed1ab_0 conda-forge jbig 2.1 h7f98852_2003 conda-forge jedi 0.18.1 py38h578d9bd_0 conda-forge jinja2 3.0.3 pyhd8ed1ab_0 conda-forge joblib 1.1.0 pyhd8ed1ab_0 conda-forge jpeg 9d h36c2ea0_0 conda-forge json-c 0.15 h98cffda_0 conda-forge json5 0.9.5 pyh9f0ad1d_0 conda-forge jsonschema 4.3.3 pyhd8ed1ab_0 conda-forge jupyter-server-proxy 3.2.0 pyhd8ed1ab_0 conda-forge jupyter_client 7.1.0 pyhd8ed1ab_0 conda-forge jupyter_core 4.9.1 py38h578d9bd_1 conda-forge jupyter_server 1.13.1 pyhd8ed1ab_0 conda-forge jupyterlab 3.2.6 pyhd8ed1ab_0 conda-forge jupyterlab_pygments 0.1.2 pyh9f0ad1d_0 conda-forge jupyterlab_server 2.10.3 pyhd8ed1ab_0 conda-forge jupyterlab_widgets 1.0.2 pyhd8ed1ab_0 conda-forge jxrlib 1.1 h7f98852_2 conda-forge kealib 1.4.14 h87e4c3c_3 conda-forge kiwisolver 1.3.2 py38h1fd1430_1 conda-forge krb5 1.19.2 hcc1bbae_3 conda-forge lcms2 2.12 hddcbb42_0 conda-forge ld_impl_linux-64 2.36.1 hea4e1c9_2 conda-forge lerc 3.0 h9c3ff4c_0 conda-forge libaec 1.0.6 h9c3ff4c_0 conda-forge libblas 3.9.0 12_linux64_openblas conda-forge libbrotlicommon 1.0.9 h7f98852_6 conda-forge libbrotlidec 1.0.9 h7f98852_6 conda-forge libbrotlienc 1.0.9 h7f98852_6 conda-forge libcblas 3.9.0 12_linux64_openblas conda-forge libcucim 22.02.00a220111 cuda11_gab8e6a4_31 rapidsai-nightly libcudf 22.02.00a220111 cuda11_g951f630dfe_250 rapidsai-nightly libcudf_kafka 22.02.00a220111 g951f630dfe_250 rapidsai-nightly libcugraph 22.02.00a220111 cuda11_g6883cc19_66 rapidsai-nightly libcugraph_etl 22.02.00a220111 cuda11_g6883cc19_66 rapidsai-nightly libcuml 22.02.00a220111 cuda11_g416ce61a4_84 rapidsai-nightly libcumlprims 22.02.00a220106 cuda11_g06a42b1_14 rapidsai-nightly libcurl 7.81.0 h2574ce0_0 conda-forge libcusolver 11.3.2.107 hc875929_0 nvidia libcuspatial 22.02.00a220110 cuda11_g55280e3_14 rapidsai-nightly libdap4 3.20.6 hd7c4107_2 conda-forge libdeflate 1.8 h7f98852_0 conda-forge libedit 3.1.20191231 he28a2e2_2 conda-forge libev 4.33 h516909a_1 conda-forge libevent 2.1.10 h9b69904_4 conda-forge libfaiss 1.7.0 cuda112h5bea7ad_8_cuda conda-forge libffi 3.4.2 h7f98852_5 conda-forge libgcc-ng 11.2.0 h1d223b6_11 conda-forge libgcrypt 1.9.4 h7f98852_0 conda-forge libgdal 3.3.2 h6acdded_3 conda-forge libgfortran-ng 11.2.0 h69a702a_11 conda-forge libgfortran5 11.2.0 h5c6108e_11 conda-forge libglib 2.70.2 h174f98d_1 conda-forge libgomp 11.2.0 h1d223b6_11 conda-forge libgpg-error 1.42 h9c3ff4c_0 conda-forge libgsasl 1.10.0 h5b4c23d_0 conda-forge libhwloc 2.3.0 h5e5b7d1_1 conda-forge libiconv 1.16 h516909a_0 conda-forge libkml 1.3.0 h238a007_1014 conda-forge liblapack 3.9.0 12_linux64_openblas conda-forge libllvm11 11.1.0 hf817b99_2 conda-forge libnetcdf 4.8.1 nompi_hb3fd0d9_101 conda-forge libnghttp2 1.43.0 h812cca2_1 conda-forge libnsl 2.0.0 h7f98852_0 conda-forge libntlm 1.4 h7f98852_1002 conda-forge libopenblas 0.3.18 pthreads_h8fe5266_0 conda-forge libpng 1.6.37 h21135ba_2 conda-forge libpq 13.5 hd57d9b9_1 conda-forge libprotobuf 3.19.2 h780b84a_0 conda-forge librdkafka 1.7.0 hc49e61c_1 conda-forge librmm 22.02.00a220111 cuda11_g5a239d2_25 rapidsai-nightly librttopo 1.1.0 h1185371_6 conda-forge libsodium 1.0.18 h36c2ea0_1 conda-forge libspatialindex 1.9.3 h9c3ff4c_4 conda-forge libspatialite 5.0.1 h5cf074c_8 conda-forge libssh2 1.10.0 ha56f1ee_2 conda-forge libstdcxx-ng 11.2.0 he4da1e4_11 conda-forge libthrift 0.15.0 he6d91bd_1 conda-forge libtiff 4.3.0 h6f004c6_2 conda-forge libutf8proc 2.7.0 h7f98852_0 conda-forge libuuid 2.32.1 h7f98852_1000 conda-forge libuv 1.42.0 h7f98852_0 conda-forge libwebp 1.2.1 h3452ae3_0 conda-forge libwebp-base 1.2.1 h7f98852_0 conda-forge libxcb 1.13 h7f98852_1004 conda-forge libxgboost 1.5.0dev.rapidsai22.02 cuda11.2_0 rapidsai-nightly libxml2 2.9.12 h72842e0_0 conda-forge libzip 1.8.0 h4de3113_1 conda-forge libzlib 1.2.11 h36c2ea0_1013 conda-forge libzopfli 1.0.3 h9c3ff4c_0 conda-forge llvmlite 0.37.0 py38h4630a5e_1 conda-forge locket 0.2.0 py_2 conda-forge lz4-c 1.9.3 h9c3ff4c_1 conda-forge mapclassify 2.4.3 pyhd8ed1ab_0 conda-forge markdown 3.3.6 pyhd8ed1ab_0 conda-forge markupsafe 2.0.1 py38h497a2fe_1 conda-forge matplotlib-base 3.5.1 py38hf4fb855_0 conda-forge matplotlib-inline 0.1.3 pyhd8ed1ab_0 conda-forge mistune 0.8.4 py38h497a2fe_1005 conda-forge msgpack-python 1.0.3 py38h1fd1430_0 conda-forge multidict 5.2.0 py38h497a2fe_1 conda-forge multipledispatch 0.6.0 py_0 conda-forge munch 2.5.0 py_0 conda-forge munkres 1.1.4 pyh9f0ad1d_0 conda-forge nbclassic 0.3.4 pyhd8ed1ab_0 conda-forge nbclient 0.5.9 pyhd8ed1ab_0 conda-forge nbconvert 6.4.0 py38h578d9bd_0 conda-forge nbformat 5.1.3 pyhd8ed1ab_0 conda-forge nccl 2.11.4.1 hdc17891_0 conda-forge ncurses 6.2 h58526e2_4 conda-forge nest-asyncio 1.5.4 pyhd8ed1ab_0 conda-forge networkx 2.6.3 pyhd8ed1ab_1 conda-forge nodejs 14.17.4 h92b4a50_0 conda-forge notebook 6.4.6 pyha770c72_0 conda-forge nspr 4.32 h9c3ff4c_1 conda-forge nss 3.74 hb5efdd6_0 conda-forge numba 0.54.1 py38h4bf6c61_0 conda-forge numpy 1.20.3 py38h9894fe3_1 conda-forge nvtx 0.2.3 py38h497a2fe_1 conda-forge olefile 0.46 pyh9f0ad1d_1 conda-forge openjpeg 2.4.0 hb52868f_1 conda-forge openssl 1.1.1l h7f98852_0 conda-forge orc 1.7.2 h1be678f_0 conda-forge oscrypto 1.2.1 pyhd3deb0d_0 conda-forge packaging 21.3 pyhd8ed1ab_0 conda-forge pandas 1.3.5 py38h43a58ef_0 conda-forge pandoc 2.16.2 h7f98852_0 conda-forge pandocfilters 1.5.0 pyhd8ed1ab_0 conda-forge panel 0.12.4 pyhd8ed1ab_0 conda-forge param 1.12.0 pyh6c4a22f_0 conda-forge parquet-cpp 1.5.1 1 conda-forge parso 0.8.3 pyhd8ed1ab_0 conda-forge partd 1.2.0 pyhd8ed1ab_0 conda-forge pcre 8.45 h9c3ff4c_0 conda-forge pexpect 4.8.0 py38h32f6830_1 conda-forge pickleshare 0.7.5 py38h32f6830_1002 conda-forge pillow 8.4.0 py38h8e6f84c_0 conda-forge pip 21.3.1 pyhd8ed1ab_0 conda-forge pixman 0.40.0 h36c2ea0_0 conda-forge pooch 1.5.2 pyhd8ed1ab_0 conda-forge poppler 21.09.0 ha39eefc_3 conda-forge poppler-data 0.4.11 hd8ed1ab_0 conda-forge postgresql 13.5 h2510834_1 conda-forge proj 8.1.0 h277dcde_1 conda-forge prometheus_client 0.12.0 pyhd8ed1ab_0 conda-forge prompt-toolkit 3.0.24 pyha770c72_0 conda-forge protobuf 3.19.2 py38h709712a_0 conda-forge psutil 5.9.0 py38h497a2fe_0 conda-forge pthread-stubs 0.4 h36c2ea0_1001 conda-forge ptxcompiler 0.2.0 py38hb739d79_0 rapidsai-nightly ptyprocess 0.7.0 pyhd3deb0d_0 conda-forge py-xgboost 1.5.0dev.rapidsai22.02 cuda11.2py38_0 rapidsai-nightly pyarrow 5.0.0 py38ha746e9d_22_cuda conda-forge pycparser 2.21 pyhd8ed1ab_0 conda-forge pycryptodomex 3.12.0 py38h497a2fe_0 conda-forge pyct 0.4.6 py_0 conda-forge pyct-core 0.4.6 py_0 conda-forge pydeck 0.5.0 pyh9f0ad1d_0 conda-forge pyee 8.1.0 pyh9f0ad1d_0 conda-forge pygments 2.11.2 pyhd8ed1ab_0 conda-forge pyjwt 2.3.0 pyhd8ed1ab_1 conda-forge pylibcugraph 22.02.00a220111 cuda11_py38_g6883cc19_66 rapidsai-nightly pynvml 11.4.1 pyhd8ed1ab_0 conda-forge pyopenssl 21.0.0 pyhd8ed1ab_0 conda-forge pyparsing 3.0.6 pyhd8ed1ab_0 conda-forge pyppeteer 0.2.6 pyhd8ed1ab_0 conda-forge pyproj 3.1.0 py38h3701b11_4 conda-forge pyrsistent 0.18.0 py38h497a2fe_0 conda-forge pysocks 1.7.1 py38h578d9bd_4 conda-forge python 3.8.12 hb7a2778_2_cpython conda-forge python-confluent-kafka 1.7.0 py38h497a2fe_2 conda-forge python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge python_abi 3.8 2_cp38 conda-forge pytz 2021.3 pyhd8ed1ab_0 conda-forge pyviz_comms 2.1.0 pyhd8ed1ab_0 conda-forge pywavelets 1.2.0 py38h6c62de6_1 conda-forge pyyaml 6.0 py38h497a2fe_3 conda-forge pyzmq 22.3.0 py38h2035c66_1 conda-forge rapids 22.02.00a220111 cuda11.2_py38_g365c37f_104 rapidsai-nightly rapids-xgboost 22.02.00a220111 cuda11.2_py38_g365c37f_104 rapidsai-nightly re2 2021.11.01 h9c3ff4c_0 conda-forge readline 8.1 h46c0cb4_0 conda-forge requests 2.27.1 pyhd8ed1ab_0 conda-forge rmm 22.02.00a220111 cuda11_py38_g5a239d2_25_has_cma rapidsai-nightly rtree 0.9.7 py38h02d302b_3 conda-forge s2n 1.0.10 h9b69904_0 conda-forge scikit-image 0.18.1 py38h51da96c_0 conda-forge scikit-learn 1.0.2 py38h1561384_0 conda-forge scipy 1.7.3 py38h56a6a73_0 conda-forge send2trash 1.8.0 pyhd8ed1ab_0 conda-forge setuptools 60.5.0 py38h578d9bd_0 conda-forge shapely 1.8.0 py38hb7fe4a8_0 conda-forge simpervisor 0.4 pyhd8ed1ab_0 conda-forge six 1.16.0 pyh6c4a22f_0 conda-forge snappy 1.1.8 he1b5a44_3 conda-forge sniffio 1.2.0 py38h578d9bd_2 conda-forge snowflake-connector-python 2.7.2 py38h8914348_0 conda-forge snowflake-sqlalchemy 1.3.3 pyhd8ed1ab_0 conda-forge sortedcontainers 2.4.0 pyhd8ed1ab_0 conda-forge spdlog 1.8.5 h4bd325d_0 conda-forge sqlalchemy 1.4.29 py38h497a2fe_0 conda-forge sqlite 3.37.0 h9cd32fc_0 conda-forge streamz 0.6.3 pyh6c4a22f_0 conda-forge tblib 1.7.0 pyhd8ed1ab_0 conda-forge terminado 0.12.1 py38h578d9bd_1 conda-forge testpath 0.5.0 pyhd8ed1ab_0 conda-forge threadpoolctl 3.0.0 pyh8a188c0_0 conda-forge tifffile 2021.11.2 pyhd8ed1ab_0 conda-forge tiledb 2.3.4 he87e0bf_0 conda-forge tk 8.6.11 h27826a3_1 conda-forge toolz 0.11.2 pyhd8ed1ab_0 conda-forge tornado 6.1 py38h497a2fe_2 conda-forge tqdm 4.62.3 pyhd8ed1ab_0 conda-forge traitlets 5.1.1 pyhd8ed1ab_0 conda-forge treelite 2.1.0 py38hdd725b4_0 conda-forge treelite-runtime 2.1.0 pypi_0 pypi typing-extensions 4.0.1 hd8ed1ab_0 conda-forge typing_extensions 4.0.1 pyha770c72_0 conda-forge tzcode 2021e h7f98852_0 conda-forge tzdata 2021e he74cb21_0 conda-forge ucx 1.11.2+gef2bbcf cuda11.2_0 rapidsai-nightly ucx-proc 1.0.0 gpu rapidsai-nightly ucx-py 0.24.0a220111 py38_gef2bbcf_24 rapidsai-nightly unicodedata2 14.0.0 py38h497a2fe_0 conda-forge urllib3 1.26.8 pyhd8ed1ab_1 conda-forge wcwidth 0.2.5 pyh9f0ad1d_2 conda-forge webencodings 0.5.1 py_1 conda-forge websocket-client 1.2.3 pyhd8ed1ab_0 conda-forge websockets 9.1 py38h497a2fe_0 conda-forge wheel 0.37.1 pyhd8ed1ab_0 conda-forge widgetsnbextension 3.5.2 py38h578d9bd_1 conda-forge xarray 0.20.2 pyhd8ed1ab_0 conda-forge xerces-c 3.2.3 h9d8b166_3 conda-forge xgboost 1.5.0dev.rapidsai22.02 cuda11.2py38_0 rapidsai-nightly xorg-kbproto 1.0.7 h7f98852_1002 conda-forge xorg-libice 1.0.10 h7f98852_0 conda-forge xorg-libsm 1.2.3 hd9c2040_1000 conda-forge xorg-libx11 1.7.2 h7f98852_0 conda-forge xorg-libxau 1.0.9 h7f98852_0 conda-forge xorg-libxdmcp 1.1.3 h7f98852_0 conda-forge xorg-libxext 1.3.4 h7f98852_1 conda-forge xorg-libxrender 0.9.10 h7f98852_1003 conda-forge xorg-renderproto 0.11.1 h7f98852_1002 conda-forge xorg-xextproto 7.3.0 h7f98852_1002 conda-forge xorg-xproto 7.0.31 h7f98852_1007 conda-forge xz 5.2.5 h516909a_1 conda-forge yaml 0.2.5 h7f98852_2 conda-forge yarl 1.7.2 py38h497a2fe_1 conda-forge zeromq 4.3.4 h9c3ff4c_1 conda-forge zfp 0.5.5 h9c3ff4c_8 conda-forge zict 2.0.0 py_0 conda-forge zipp 3.7.0 pyhd8ed1ab_0 conda-forge zlib 1.2.11 h36c2ea0_1013 conda-forge zstd 1.5.1 ha95c52a_0 conda-forge
@github-actions
Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@jhancock1975
Copy link

jhancock1975 commented Mar 19, 2022

I ran into this, I'm working with Pandas dataframes.

After using imblearn RandomUnderSampler, my binary valued labels were in order, i.e. all 0's then all 1's.

I used pd.sample to shuffle the data, and I had to reset indices to make the error go away. Here is the snippet:

                X_train, y_train = rus.fit_resample(X_train, y_train)
                # after fit resample, labels are in order, when sharded
                # some estimators get only label of one value, causing error,
                # "The labels need to be consecutive values from 0 to the number of unique label
                # values"
                # since some estimators will get instances with all 1.  So we recombine X and y
                # to a dataframe, shuffle, then split back up
                X_train['y'] = y_train
                X_train = X_train.sample(frac=1.0).reset_index(drop=True)
                y_train = X_train['y']
                X_train.drop('y', axis=1, inplace=True)

@beckernick
Copy link
Member Author

beckernick commented Mar 21, 2022

Generalizing from @jhancock1975 's code snippet, this error can occur even with correctly formatted data during cross validation (if a fold doesn't get the right subset of labels).

import pandas as pd
import cuml
from sklearn.model_selection import cross_val_score

df = pd.DataFrame({
    "x1": [0.0,1,2,3,3,4],
    "x2": [-3,2.0,5,5,3,2],
    "y": [0,1,1,1,2,2]
})

clf = cuml.ensemble.RandomForestClassifier()

cross_val_score(
    clf,
    df[["x1", "x2"]],
    df["y"],
    cv=2,
    error_score="raise"
)
# Throws the error linked above

@github-actions
Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@beckernick
Copy link
Member Author

It should be possible to do this using cuml.prims.label.make_monotonic, which was designed for this use case.

This function looks like it might have an expensive JIT cost, though. Perhaps

void make_monotonic(Type* out, Type* in, size_t N, cudaStream_t stream)
may be relevant here?

@cjnolet
Copy link
Member

cjnolet commented May 5, 2022

@beckernick, yep, this is indeed something we had written originally in C++ in cuml for DBSCAN and have since moved to RAFT (pending removal from cuml, the raft version is more up to date). If desired, this could also be a good reason to expose through pylibraft and use in cuml (reusable and very clean separation of implementation details).

@beckernick
Copy link
Member Author

That sounds like it could be a good solution. I suspect this non-consecutive label issue will keep popping up.

Will file a new issue on RAFT, cross-link, and mark it as a good first issue based on your description.

@beckernick beckernick changed the title [FEA] cuML estimators should support non-consecutive labels ouside of [0, n) where appropriate [FEA] cuML estimators should support non-consecutive labels outside of [0, n) where appropriate May 5, 2022
@github-actions
Copy link

github-actions bot commented Jul 3, 2022

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

rapids-bot bot pushed a commit that referenced this issue Sep 29, 2022
…ive labels where appropriate (#4780)

This PR closes #4478 by transforming non-consecutive labels outside of [0,n) to consecutive labels inside [0,n) similar to what Scikit-learn does under the hood.

Closes #691

Authors:
  - https://github.com/VamsiTallam95

Approvers:
  - Micka (https://github.com/lowener)
  - Dante Gama Dessavre (https://github.com/dantegd)
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #4780
jakirkham pushed a commit to jakirkham/cuml that referenced this issue Feb 27, 2023
…ive labels where appropriate (rapidsai#4780)

This PR closes rapidsai#4478 by transforming non-consecutive labels outside of [0,n) to consecutive labels inside [0,n) similar to what Scikit-learn does under the hood.

Closes rapidsai#691

Authors:
  - https://github.com/VamsiTallam95

Approvers:
  - Micka (https://github.com/lowener)
  - Dante Gama Dessavre (https://github.com/dantegd)
  - Corey J. Nolet (https://github.com/cjnolet)

URL: rapidsai#4780
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage Need team to review and classify feature request New feature or request inactive-30d
Projects
None yet
3 participants