[FEA] cuML estimators should support non-consecutive labels outside of [0, n) where appropriate #4478

beckernick · 2022-01-12T14:07:01Z

As noted in rapidsai/cudf#10024 , cuML RandomForestClassifier will throw an error if the target column has non-consecutive labels outside of the [0, n) range. This does not occur in scikit-learn, perhaps due to label encoding happening under the hood.

This may occur with other estimators as well.

import pandas as pd
from sklearn.ensemble import RandomForestClassifier
import cuml

df = pd.DataFrame({
    "x1": [0.0,1,2],
    "x2": [-3,2.0,5],
    "y": [-3, 0, 4.0]
})
clf = RandomForestClassifier()
print(clf.fit(df[["x1", "x2"]], df["y"]))

clf2 = cuml.ensemble.RandomForestClassifier()
print(clf2.fit(df[["x1", "x2"]], df["y"]))
RandomForestClassifier()
/home/nicholasb/conda/envs/rapids-22.02-snow/lib/python3.8/site-packages/cuml/internals/api_decorators.py:567: UserWarning: The number of bins, `n_bins` is greater than the number of samples used for training. Changing `n_bins` to number of training samples.
  ret_val = func(*args, **kwargs)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_8028/3186242508.py in <module>
     12 
     13 clf2 = cuml.ensemble.RandomForestClassifier()
---> 14 print(clf2.fit(df[["x1", "x2"]], df["y"]))

~/conda/envs/rapids-22.02-snow/lib/python3.8/contextlib.py in inner(*args, **kwds)
     73         def inner(*args, **kwds):
     74             with self._recreate_cm():
---> 75                 return func(*args, **kwds)
     76         return inner
     77 

~/conda/envs/rapids-22.02-snow/lib/python3.8/site-packages/cuml/internals/api_decorators.py in inner_with_setters(*args, **kwargs)
    407                                 target_val=target_val)
    408 
--> 409                 return func(*args, **kwargs)
    410 
    411         @wraps(func)

cuml/ensemble/randomforestclassifier.pyx in cuml.ensemble.randomforestclassifier.RandomForestClassifier.fit()

~/conda/envs/rapids-22.02-snow/lib/python3.8/site-packages/cuml/internals/api_decorators.py in inner_set(*args, **kwargs)
    565 
    566                 # Call the function
--> 567                 ret_val = func(*args, **kwargs)
    568 
    569             return cm.process_return(ret_val)

cuml/ensemble/randomforest_common.pyx in cuml.ensemble.randomforest_common.BaseRandomForestModel._dataset_setup_for_fit()

ValueError: The labels need to be consecutive values from 0 to the number of unique label values

conda list # packages in environment at /home/nicholasb/conda/envs/rapids-22.02-snow: # # Name Version Build Channel _libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 1_gnu conda-forge abseil-cpp 20210324.2 h9c3ff4c_0 conda-forge aiohttp 3.8.1 py38h497a2fe_0 conda-forge aiosignal 1.2.0 pyhd8ed1ab_0 conda-forge anyio 3.5.0 py38h578d9bd_0 conda-forge appdirs 1.4.4 pyh9f0ad1d_0 conda-forge argon2-cffi 21.3.0 pyhd8ed1ab_0 conda-forge argon2-cffi-bindings 21.2.0 py38h497a2fe_1 conda-forge arrow-cpp 5.0.0 py38h579a05f_22_cuda conda-forge arrow-cpp-proc 3.0.0 cuda conda-forge asn1crypto 1.4.0 pyh9f0ad1d_0 conda-forge async-timeout 4.0.2 pyhd8ed1ab_0 conda-forge async_generator 1.10 py_0 conda-forge attrs 21.4.0 pyhd8ed1ab_0 conda-forge aws-c-cal 0.5.11 h95a6274_0 conda-forge aws-c-common 0.6.2 h7f98852_0 conda-forge aws-c-event-stream 0.2.7 h3541f99_13 conda-forge aws-c-io 0.10.5 hfb6a706_0 conda-forge aws-checksums 0.1.11 ha31a3da_7 conda-forge aws-sdk-cpp 1.8.186 hb4091e7_3 conda-forge babel 2.9.1 pyh44b312d_0 conda-forge backcall 0.2.0 pyh9f0ad1d_0 conda-forge backports 1.0 py_2 conda-forge backports.functools_lru_cache 1.6.4 pyhd8ed1ab_0 conda-forge bleach 4.1.0 pyhd8ed1ab_0 conda-forge blosc 1.21.0 h9c3ff4c_0 conda-forge bokeh 2.4.0 py38h578d9bd_0 conda-forge boost 1.74.0 py38h2b96118_4 conda-forge boost-cpp 1.74.0 h312852a_4 conda-forge brotli 1.0.9 h7f98852_6 conda-forge brotli-bin 1.0.9 h7f98852_6 conda-forge brotlipy 0.7.0 py38h497a2fe_1003 conda-forge brunsli 0.1 h9c3ff4c_0 conda-forge bzip2 1.0.8 h7f98852_4 conda-forge c-ares 1.18.1 h7f98852_0 conda-forge c-blosc2 2.0.4 h5f21a17_1 conda-forge ca-certificates 2021.10.8 ha878542_0 conda-forge cachetools 5.0.0 pyhd8ed1ab_0 conda-forge cairo 1.16.0 h6cf1ce9_1008 conda-forge certifi 2021.10.8 py38h578d9bd_1 conda-forge cffi 1.15.0 py38h3931269_0 conda-forge cfitsio 3.470 hb418390_7 conda-forge charls 2.2.0 h9c3ff4c_0 conda-forge charset-normalizer 2.0.10 pyhd8ed1ab_0 conda-forge click 8.0.3 py38h578d9bd_1 conda-forge click-plugins 1.1.1 py_0 conda-forge cligj 0.7.2 pyhd8ed1ab_1 conda-forge cloudpickle 2.0.0 pyhd8ed1ab_0 conda-forge colorama 0.4.4 pyh9f0ad1d_0 conda-forge colorcet 3.0.0 pyhd8ed1ab_0 conda-forge cryptography 35.0.0 py38h3e25421_2 conda-forge cucim 22.02.00a220111 cuda_11_py38_gab8e6a4_31 rapidsai-nightly cuda-python 11.5.0 py38h3fd9d12_0 nvidia cudatoolkit 11.2.72 h2bc3f7f_0 nvidia cudf 22.02.00a220111 cuda_11_py38_g951f630dfe_250 rapidsai-nightly cudf_kafka 22.02.00a220111 py38_g951f630dfe_250 rapidsai-nightly cugraph 22.02.00a220111 cuda11_py38_g6883cc19_66 rapidsai-nightly cuml 22.02.00a220111 cuda11_py38_g416ce61a4_84 rapidsai-nightly cupy 9.6.0 py38h177b0fd_0 conda-forge curl 7.81.0 h2574ce0_0 conda-forge cusignal 22.02.00a220111 py38_g6a02566_9 rapidsai-nightly cuspatial 22.02.00a220110 py38_g55280e3_14 rapidsai-nightly custreamz 22.02.00a220111 py38_g951f630dfe_250 rapidsai-nightly cuxfilter 22.02.00a220111 py38_g7c4dc24_7 rapidsai-nightly cycler 0.11.0 pyhd8ed1ab_0 conda-forge cyrus-sasl 2.1.27 h230043b_5 conda-forge cytoolz 0.11.2 py38h497a2fe_1 conda-forge dask 2021.11.2 pyhd8ed1ab_0 conda-forge dask-core 2021.11.2 pyhd8ed1ab_0 conda-forge dask-cuda 22.02.00a220111 py38_45 rapidsai-nightly dask-cudf 22.02.00a220111 cuda_11_py38_g951f630dfe_250 rapidsai-nightly dask-snowflake 0.0.2 pyhd8ed1ab_0 conda-forge datashader 0.11.1 pyh9f0ad1d_0 conda-forge datashape 0.5.4 py_1 conda-forge debugpy 1.5.1 py38h709712a_0 conda-forge decorator 5.1.1 pyhd8ed1ab_0 conda-forge defusedxml 0.7.1 pyhd8ed1ab_0 conda-forge distributed 2021.11.2 py38h578d9bd_0 conda-forge dlpack 0.5 h9c3ff4c_0 conda-forge entrypoints 0.3 py38h32f6830_1002 conda-forge expat 2.4.2 h9c3ff4c_0 conda-forge faiss-proc 1.0.0 cuda conda-forge fastavro 1.4.9 py38h497a2fe_0 conda-forge fastrlock 0.8 py38h709712a_1 conda-forge fiona 1.8.20 py38hbb147eb_2 conda-forge flit-core 3.6.0 pyhd8ed1ab_0 conda-forge font-ttf-dejavu-sans-mono 2.37 hab24e00_0 conda-forge font-ttf-inconsolata 3.000 h77eed37_0 conda-forge font-ttf-source-code-pro 2.038 h77eed37_0 conda-forge font-ttf-ubuntu 0.83 hab24e00_0 conda-forge fontconfig 2.13.1 hba837de_1005 conda-forge fonts-conda-ecosystem 1 0 conda-forge fonts-conda-forge 1 0 conda-forge fonttools 4.28.5 py38h497a2fe_0 conda-forge freetype 2.10.4 h0708190_1 conda-forge freexl 1.0.6 h7f98852_0 conda-forge frozenlist 1.2.0 py38h497a2fe_1 conda-forge fsspec 2021.11.1 pyhd8ed1ab_0 conda-forge gdal 3.3.2 py38h81a01a0_3 conda-forge geopandas 0.9.0 pyhd8ed1ab_1 conda-forge geopandas-base 0.9.0 pyhd8ed1ab_1 conda-forge geos 3.9.1 h9c3ff4c_2 conda-forge geotiff 1.7.0 h08e826d_2 conda-forge gettext 0.19.8.1 h73d1719_1008 conda-forge gflags 2.2.2 he1b5a44_1004 conda-forge giflib 5.2.1 h36c2ea0_2 conda-forge glog 0.5.0 h48cff8f_0 conda-forge gmp 6.2.1 h58526e2_0 conda-forge greenlet 1.1.2 py38h709712a_1 conda-forge grpc-cpp 1.42.0 ha1441d3_1 conda-forge hdf4 4.2.15 h10796ff_3 conda-forge hdf5 1.12.1 nompi_h2750804_103 conda-forge heapdict 1.0.1 py_0 conda-forge icu 68.2 h9c3ff4c_0 conda-forge idna 3.1 pyhd3deb0d_0 conda-forge imagecodecs 2021.8.26 py38hb5ce8f7_1 conda-forge imageio 2.13.5 pyh239f2a4_0 conda-forge importlib-metadata 4.10.0 py38h578d9bd_0 conda-forge importlib_metadata 4.10.0 hd8ed1ab_0 conda-forge importlib_resources 5.4.0 pyhd8ed1ab_0 conda-forge ipykernel 6.6.1 py38he5a9106_0 conda-forge ipython 7.31.0 py38h578d9bd_0 conda-forge ipython_genutils 0.2.0 py_1 conda-forge ipywidgets 7.6.5 pyhd8ed1ab_0 conda-forge jbig 2.1 h7f98852_2003 conda-forge jedi 0.18.1 py38h578d9bd_0 conda-forge jinja2 3.0.3 pyhd8ed1ab_0 conda-forge joblib 1.1.0 pyhd8ed1ab_0 conda-forge jpeg 9d h36c2ea0_0 conda-forge json-c 0.15 h98cffda_0 conda-forge json5 0.9.5 pyh9f0ad1d_0 conda-forge jsonschema 4.3.3 pyhd8ed1ab_0 conda-forge jupyter-server-proxy 3.2.0 pyhd8ed1ab_0 conda-forge jupyter_client 7.1.0 pyhd8ed1ab_0 conda-forge jupyter_core 4.9.1 py38h578d9bd_1 conda-forge jupyter_server 1.13.1 pyhd8ed1ab_0 conda-forge jupyterlab 3.2.6 pyhd8ed1ab_0 conda-forge jupyterlab_pygments 0.1.2 pyh9f0ad1d_0 conda-forge jupyterlab_server 2.10.3 pyhd8ed1ab_0 conda-forge jupyterlab_widgets 1.0.2 pyhd8ed1ab_0 conda-forge jxrlib 1.1 h7f98852_2 conda-forge kealib 1.4.14 h87e4c3c_3 conda-forge kiwisolver 1.3.2 py38h1fd1430_1 conda-forge krb5 1.19.2 hcc1bbae_3 conda-forge lcms2 2.12 hddcbb42_0 conda-forge ld_impl_linux-64 2.36.1 hea4e1c9_2 conda-forge lerc 3.0 h9c3ff4c_0 conda-forge libaec 1.0.6 h9c3ff4c_0 conda-forge libblas 3.9.0 12_linux64_openblas conda-forge libbrotlicommon 1.0.9 h7f98852_6 conda-forge libbrotlidec 1.0.9 h7f98852_6 conda-forge libbrotlienc 1.0.9 h7f98852_6 conda-forge libcblas 3.9.0 12_linux64_openblas conda-forge libcucim 22.02.00a220111 cuda11_gab8e6a4_31 rapidsai-nightly libcudf 22.02.00a220111 cuda11_g951f630dfe_250 rapidsai-nightly libcudf_kafka 22.02.00a220111 g951f630dfe_250 rapidsai-nightly libcugraph 22.02.00a220111 cuda11_g6883cc19_66 rapidsai-nightly libcugraph_etl 22.02.00a220111 cuda11_g6883cc19_66 rapidsai-nightly libcuml 22.02.00a220111 cuda11_g416ce61a4_84 rapidsai-nightly libcumlprims 22.02.00a220106 cuda11_g06a42b1_14 rapidsai-nightly libcurl 7.81.0 h2574ce0_0 conda-forge libcusolver 11.3.2.107 hc875929_0 nvidia libcuspatial 22.02.00a220110 cuda11_g55280e3_14 rapidsai-nightly libdap4 3.20.6 hd7c4107_2 conda-forge libdeflate 1.8 h7f98852_0 conda-forge libedit 3.1.20191231 he28a2e2_2 conda-forge libev 4.33 h516909a_1 conda-forge libevent 2.1.10 h9b69904_4 conda-forge libfaiss 1.7.0 cuda112h5bea7ad_8_cuda conda-forge libffi 3.4.2 h7f98852_5 conda-forge libgcc-ng 11.2.0 h1d223b6_11 conda-forge libgcrypt 1.9.4 h7f98852_0 conda-forge libgdal 3.3.2 h6acdded_3 conda-forge libgfortran-ng 11.2.0 h69a702a_11 conda-forge libgfortran5 11.2.0 h5c6108e_11 conda-forge libglib 2.70.2 h174f98d_1 conda-forge libgomp 11.2.0 h1d223b6_11 conda-forge libgpg-error 1.42 h9c3ff4c_0 conda-forge libgsasl 1.10.0 h5b4c23d_0 conda-forge libhwloc 2.3.0 h5e5b7d1_1 conda-forge libiconv 1.16 h516909a_0 conda-forge libkml 1.3.0 h238a007_1014 conda-forge liblapack 3.9.0 12_linux64_openblas conda-forge libllvm11 11.1.0 hf817b99_2 conda-forge libnetcdf 4.8.1 nompi_hb3fd0d9_101 conda-forge libnghttp2 1.43.0 h812cca2_1 conda-forge libnsl 2.0.0 h7f98852_0 conda-forge libntlm 1.4 h7f98852_1002 conda-forge libopenblas 0.3.18 pthreads_h8fe5266_0 conda-forge libpng 1.6.37 h21135ba_2 conda-forge libpq 13.5 hd57d9b9_1 conda-forge libprotobuf 3.19.2 h780b84a_0 conda-forge librdkafka 1.7.0 hc49e61c_1 conda-forge librmm 22.02.00a220111 cuda11_g5a239d2_25 rapidsai-nightly librttopo 1.1.0 h1185371_6 conda-forge libsodium 1.0.18 h36c2ea0_1 conda-forge libspatialindex 1.9.3 h9c3ff4c_4 conda-forge libspatialite 5.0.1 h5cf074c_8 conda-forge libssh2 1.10.0 ha56f1ee_2 conda-forge libstdcxx-ng 11.2.0 he4da1e4_11 conda-forge libthrift 0.15.0 he6d91bd_1 conda-forge libtiff 4.3.0 h6f004c6_2 conda-forge libutf8proc 2.7.0 h7f98852_0 conda-forge libuuid 2.32.1 h7f98852_1000 conda-forge libuv 1.42.0 h7f98852_0 conda-forge libwebp 1.2.1 h3452ae3_0 conda-forge libwebp-base 1.2.1 h7f98852_0 conda-forge libxcb 1.13 h7f98852_1004 conda-forge libxgboost 1.5.0dev.rapidsai22.02 cuda11.2_0 rapidsai-nightly libxml2 2.9.12 h72842e0_0 conda-forge libzip 1.8.0 h4de3113_1 conda-forge libzlib 1.2.11 h36c2ea0_1013 conda-forge libzopfli 1.0.3 h9c3ff4c_0 conda-forge llvmlite 0.37.0 py38h4630a5e_1 conda-forge locket 0.2.0 py_2 conda-forge lz4-c 1.9.3 h9c3ff4c_1 conda-forge mapclassify 2.4.3 pyhd8ed1ab_0 conda-forge markdown 3.3.6 pyhd8ed1ab_0 conda-forge markupsafe 2.0.1 py38h497a2fe_1 conda-forge matplotlib-base 3.5.1 py38hf4fb855_0 conda-forge matplotlib-inline 0.1.3 pyhd8ed1ab_0 conda-forge mistune 0.8.4 py38h497a2fe_1005 conda-forge msgpack-python 1.0.3 py38h1fd1430_0 conda-forge multidict 5.2.0 py38h497a2fe_1 conda-forge multipledispatch 0.6.0 py_0 conda-forge munch 2.5.0 py_0 conda-forge munkres 1.1.4 pyh9f0ad1d_0 conda-forge nbclassic 0.3.4 pyhd8ed1ab_0 conda-forge nbclient 0.5.9 pyhd8ed1ab_0 conda-forge nbconvert 6.4.0 py38h578d9bd_0 conda-forge nbformat 5.1.3 pyhd8ed1ab_0 conda-forge nccl 2.11.4.1 hdc17891_0 conda-forge ncurses 6.2 h58526e2_4 conda-forge nest-asyncio 1.5.4 pyhd8ed1ab_0 conda-forge networkx 2.6.3 pyhd8ed1ab_1 conda-forge nodejs 14.17.4 h92b4a50_0 conda-forge notebook 6.4.6 pyha770c72_0 conda-forge nspr 4.32 h9c3ff4c_1 conda-forge nss 3.74 hb5efdd6_0 conda-forge numba 0.54.1 py38h4bf6c61_0 conda-forge numpy 1.20.3 py38h9894fe3_1 conda-forge nvtx 0.2.3 py38h497a2fe_1 conda-forge olefile 0.46 pyh9f0ad1d_1 conda-forge openjpeg 2.4.0 hb52868f_1 conda-forge openssl 1.1.1l h7f98852_0 conda-forge orc 1.7.2 h1be678f_0 conda-forge oscrypto 1.2.1 pyhd3deb0d_0 conda-forge packaging 21.3 pyhd8ed1ab_0 conda-forge pandas 1.3.5 py38h43a58ef_0 conda-forge pandoc 2.16.2 h7f98852_0 conda-forge pandocfilters 1.5.0 pyhd8ed1ab_0 conda-forge panel 0.12.4 pyhd8ed1ab_0 conda-forge param 1.12.0 pyh6c4a22f_0 conda-forge parquet-cpp 1.5.1 1 conda-forge parso 0.8.3 pyhd8ed1ab_0 conda-forge partd 1.2.0 pyhd8ed1ab_0 conda-forge pcre 8.45 h9c3ff4c_0 conda-forge pexpect 4.8.0 py38h32f6830_1 conda-forge pickleshare 0.7.5 py38h32f6830_1002 conda-forge pillow 8.4.0 py38h8e6f84c_0 conda-forge pip 21.3.1 pyhd8ed1ab_0 conda-forge pixman 0.40.0 h36c2ea0_0 conda-forge pooch 1.5.2 pyhd8ed1ab_0 conda-forge poppler 21.09.0 ha39eefc_3 conda-forge poppler-data 0.4.11 hd8ed1ab_0 conda-forge postgresql 13.5 h2510834_1 conda-forge proj 8.1.0 h277dcde_1 conda-forge prometheus_client 0.12.0 pyhd8ed1ab_0 conda-forge prompt-toolkit 3.0.24 pyha770c72_0 conda-forge protobuf 3.19.2 py38h709712a_0 conda-forge psutil 5.9.0 py38h497a2fe_0 conda-forge pthread-stubs 0.4 h36c2ea0_1001 conda-forge ptxcompiler 0.2.0 py38hb739d79_0 rapidsai-nightly ptyprocess 0.7.0 pyhd3deb0d_0 conda-forge py-xgboost 1.5.0dev.rapidsai22.02 cuda11.2py38_0 rapidsai-nightly pyarrow 5.0.0 py38ha746e9d_22_cuda conda-forge pycparser 2.21 pyhd8ed1ab_0 conda-forge pycryptodomex 3.12.0 py38h497a2fe_0 conda-forge pyct 0.4.6 py_0 conda-forge pyct-core 0.4.6 py_0 conda-forge pydeck 0.5.0 pyh9f0ad1d_0 conda-forge pyee 8.1.0 pyh9f0ad1d_0 conda-forge pygments 2.11.2 pyhd8ed1ab_0 conda-forge pyjwt 2.3.0 pyhd8ed1ab_1 conda-forge pylibcugraph 22.02.00a220111 cuda11_py38_g6883cc19_66 rapidsai-nightly pynvml 11.4.1 pyhd8ed1ab_0 conda-forge pyopenssl 21.0.0 pyhd8ed1ab_0 conda-forge pyparsing 3.0.6 pyhd8ed1ab_0 conda-forge pyppeteer 0.2.6 pyhd8ed1ab_0 conda-forge pyproj 3.1.0 py38h3701b11_4 conda-forge pyrsistent 0.18.0 py38h497a2fe_0 conda-forge pysocks 1.7.1 py38h578d9bd_4 conda-forge python 3.8.12 hb7a2778_2_cpython conda-forge python-confluent-kafka 1.7.0 py38h497a2fe_2 conda-forge python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge python_abi 3.8 2_cp38 conda-forge pytz 2021.3 pyhd8ed1ab_0 conda-forge pyviz_comms 2.1.0 pyhd8ed1ab_0 conda-forge pywavelets 1.2.0 py38h6c62de6_1 conda-forge pyyaml 6.0 py38h497a2fe_3 conda-forge pyzmq 22.3.0 py38h2035c66_1 conda-forge rapids 22.02.00a220111 cuda11.2_py38_g365c37f_104 rapidsai-nightly rapids-xgboost 22.02.00a220111 cuda11.2_py38_g365c37f_104 rapidsai-nightly re2 2021.11.01 h9c3ff4c_0 conda-forge readline 8.1 h46c0cb4_0 conda-forge requests 2.27.1 pyhd8ed1ab_0 conda-forge rmm 22.02.00a220111 cuda11_py38_g5a239d2_25_has_cma rapidsai-nightly rtree 0.9.7 py38h02d302b_3 conda-forge s2n 1.0.10 h9b69904_0 conda-forge scikit-image 0.18.1 py38h51da96c_0 conda-forge scikit-learn 1.0.2 py38h1561384_0 conda-forge scipy 1.7.3 py38h56a6a73_0 conda-forge send2trash 1.8.0 pyhd8ed1ab_0 conda-forge setuptools 60.5.0 py38h578d9bd_0 conda-forge shapely 1.8.0 py38hb7fe4a8_0 conda-forge simpervisor 0.4 pyhd8ed1ab_0 conda-forge six 1.16.0 pyh6c4a22f_0 conda-forge snappy 1.1.8 he1b5a44_3 conda-forge sniffio 1.2.0 py38h578d9bd_2 conda-forge snowflake-connector-python 2.7.2 py38h8914348_0 conda-forge snowflake-sqlalchemy 1.3.3 pyhd8ed1ab_0 conda-forge sortedcontainers 2.4.0 pyhd8ed1ab_0 conda-forge spdlog 1.8.5 h4bd325d_0 conda-forge sqlalchemy 1.4.29 py38h497a2fe_0 conda-forge sqlite 3.37.0 h9cd32fc_0 conda-forge streamz 0.6.3 pyh6c4a22f_0 conda-forge tblib 1.7.0 pyhd8ed1ab_0 conda-forge terminado 0.12.1 py38h578d9bd_1 conda-forge testpath 0.5.0 pyhd8ed1ab_0 conda-forge threadpoolctl 3.0.0 pyh8a188c0_0 conda-forge tifffile 2021.11.2 pyhd8ed1ab_0 conda-forge tiledb 2.3.4 he87e0bf_0 conda-forge tk 8.6.11 h27826a3_1 conda-forge toolz 0.11.2 pyhd8ed1ab_0 conda-forge tornado 6.1 py38h497a2fe_2 conda-forge tqdm 4.62.3 pyhd8ed1ab_0 conda-forge traitlets 5.1.1 pyhd8ed1ab_0 conda-forge treelite 2.1.0 py38hdd725b4_0 conda-forge treelite-runtime 2.1.0 pypi_0 pypi typing-extensions 4.0.1 hd8ed1ab_0 conda-forge typing_extensions 4.0.1 pyha770c72_0 conda-forge tzcode 2021e h7f98852_0 conda-forge tzdata 2021e he74cb21_0 conda-forge ucx 1.11.2+gef2bbcf cuda11.2_0 rapidsai-nightly ucx-proc 1.0.0 gpu rapidsai-nightly ucx-py 0.24.0a220111 py38_gef2bbcf_24 rapidsai-nightly unicodedata2 14.0.0 py38h497a2fe_0 conda-forge urllib3 1.26.8 pyhd8ed1ab_1 conda-forge wcwidth 0.2.5 pyh9f0ad1d_2 conda-forge webencodings 0.5.1 py_1 conda-forge websocket-client 1.2.3 pyhd8ed1ab_0 conda-forge websockets 9.1 py38h497a2fe_0 conda-forge wheel 0.37.1 pyhd8ed1ab_0 conda-forge widgetsnbextension 3.5.2 py38h578d9bd_1 conda-forge xarray 0.20.2 pyhd8ed1ab_0 conda-forge xerces-c 3.2.3 h9d8b166_3 conda-forge xgboost 1.5.0dev.rapidsai22.02 cuda11.2py38_0 rapidsai-nightly xorg-kbproto 1.0.7 h7f98852_1002 conda-forge xorg-libice 1.0.10 h7f98852_0 conda-forge xorg-libsm 1.2.3 hd9c2040_1000 conda-forge xorg-libx11 1.7.2 h7f98852_0 conda-forge xorg-libxau 1.0.9 h7f98852_0 conda-forge xorg-libxdmcp 1.1.3 h7f98852_0 conda-forge xorg-libxext 1.3.4 h7f98852_1 conda-forge xorg-libxrender 0.9.10 h7f98852_1003 conda-forge xorg-renderproto 0.11.1 h7f98852_1002 conda-forge xorg-xextproto 7.3.0 h7f98852_1002 conda-forge xorg-xproto 7.0.31 h7f98852_1007 conda-forge xz 5.2.5 h516909a_1 conda-forge yaml 0.2.5 h7f98852_2 conda-forge yarl 1.7.2 py38h497a2fe_1 conda-forge zeromq 4.3.4 h9c3ff4c_1 conda-forge zfp 0.5.5 h9c3ff4c_8 conda-forge zict 2.0.0 py_0 conda-forge zipp 3.7.0 pyhd8ed1ab_0 conda-forge zlib 1.2.11 h36c2ea0_1013 conda-forge zstd 1.5.1 ha95c52a_0 conda-forge

github-actions · 2022-02-11T15:08:29Z

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

jhancock1975 · 2022-03-19T02:23:00Z

I ran into this, I'm working with Pandas dataframes.

After using imblearn RandomUnderSampler, my binary valued labels were in order, i.e. all 0's then all 1's.

I used pd.sample to shuffle the data, and I had to reset indices to make the error go away. Here is the snippet:

                X_train, y_train = rus.fit_resample(X_train, y_train)
                # after fit resample, labels are in order, when sharded
                # some estimators get only label of one value, causing error,
                # "The labels need to be consecutive values from 0 to the number of unique label
                # values"
                # since some estimators will get instances with all 1.  So we recombine X and y
                # to a dataframe, shuffle, then split back up
                X_train['y'] = y_train
                X_train = X_train.sample(frac=1.0).reset_index(drop=True)
                y_train = X_train['y']
                X_train.drop('y', axis=1, inplace=True)

beckernick · 2022-03-21T20:52:44Z

Generalizing from @jhancock1975 's code snippet, this error can occur even with correctly formatted data during cross validation (if a fold doesn't get the right subset of labels).

import pandas as pd
import cuml
from sklearn.model_selection import cross_val_score

df = pd.DataFrame({
    "x1": [0.0,1,2,3,3,4],
    "x2": [-3,2.0,5,5,3,2],
    "y": [0,1,1,1,2,2]
})

clf = cuml.ensemble.RandomForestClassifier()

cross_val_score(
    clf,
    df[["x1", "x2"]],
    df["y"],
    cv=2,
    error_score="raise"
)
# Throws the error linked above

beckernick · 2022-03-23T19:18:30Z

Scikit-learn does this under the hood here:

https://github.com/scikit-learn/scikit-learn/blob/e5736afb316038c43301d2c53ce39f9a89b64495/sklearn/ensemble/_forest.py#L371

https://github.com/scikit-learn/scikit-learn/blob/e5736afb316038c43301d2c53ce39f9a89b64495/sklearn/ensemble/_forest.py#L756-L775

github-actions · 2022-04-22T20:02:55Z

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

beckernick · 2022-05-05T18:53:32Z

It should be possible to do this using cuml.prims.label.make_monotonic, which was designed for this use case.

This function looks like it might have an expensive JIT cost, though. Perhaps

cuml/cpp/src_prims/label/classlabels.cuh

Line 164 in 768a4ed

void make_monotonic(Type* out, Type* in, size_t N, cudaStream_t stream)

may be relevant here?

cjnolet · 2022-05-05T19:15:37Z

@beckernick, yep, this is indeed something we had written originally in C++ in cuml for DBSCAN and have since moved to RAFT (pending removal from cuml, the raft version is more up to date). If desired, this could also be a good reason to expose through pylibraft and use in cuml (reusable and very clean separation of implementation details).

beckernick · 2022-05-05T19:21:31Z

That sounds like it could be a good solution. I suspect this non-consecutive label issue will keep popping up.

Will file a new issue on RAFT, cross-link, and mark it as a good first issue based on your description.

github-actions · 2022-07-03T01:33:05Z

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

…ive labels where appropriate (#4780) This PR closes #4478 by transforming non-consecutive labels outside of [0,n) to consecutive labels inside [0,n) similar to what Scikit-learn does under the hood. Closes #691 Authors: - https://github.com/VamsiTallam95 Approvers: - Micka (https://github.com/lowener) - Dante Gama Dessavre (https://github.com/dantegd) - Corey J. Nolet (https://github.com/cjnolet) URL: #4780

…ive labels where appropriate (rapidsai#4780) This PR closes rapidsai#4478 by transforming non-consecutive labels outside of [0,n) to consecutive labels inside [0,n) similar to what Scikit-learn does under the hood. Closes rapidsai#691 Authors: - https://github.com/VamsiTallam95 Approvers: - Micka (https://github.com/lowener) - Dante Gama Dessavre (https://github.com/dantegd) - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#4780

beckernick added feature request New feature or request ? - Needs Triage Need team to review and classify labels Jan 12, 2022

This was referenced Jan 12, 2022

[QST] Keep Getting Error: "The labels need to be consecutive values from 0 to the number of unique label values" rapidsai/cudf#10024

Closed

[BUG] How To Pass cuDF Dataframe to cuML.ensemble.RandomForestClassifier? #4480

Closed

github-actions bot added the inactive-30d label Feb 11, 2022

github-actions bot removed the inactive-30d label Mar 19, 2022

github-actions bot added the inactive-30d label Apr 22, 2022

github-actions bot removed the inactive-30d label May 5, 2022

beckernick mentioned this issue May 5, 2022

[FEA] Expose make_monotonic in pylibraft rapidsai/raft#640

Open

beckernick changed the title ~~[FEA] cuML estimators should support non-consecutive labels ouside of [0, n) where appropriate~~ [FEA] cuML estimators should support non-consecutive labels outside of [0, n) where appropriate May 5, 2022

VamsiTallam95 mentioned this issue Jun 17, 2022

Transforms RandomForest estimators non-consecutive labels to consecutive labels where appropriate #4780

Merged

github-actions bot added the inactive-30d label Jul 3, 2022

rapids-bot bot closed this as completed in #4780 Sep 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] cuML estimators should support non-consecutive labels outside of [0, n) where appropriate #4478

[FEA] cuML estimators should support non-consecutive labels outside of [0, n) where appropriate #4478

beckernick commented Jan 12, 2022

github-actions bot commented Feb 11, 2022

jhancock1975 commented Mar 19, 2022 •

edited

Loading

beckernick commented Mar 21, 2022 •

edited

Loading

beckernick commented Mar 23, 2022

github-actions bot commented Apr 22, 2022

beckernick commented May 5, 2022

cjnolet commented May 5, 2022

beckernick commented May 5, 2022

github-actions bot commented Jul 3, 2022

[FEA] cuML estimators should support non-consecutive labels outside of [0, n) where appropriate #4478

[FEA] cuML estimators should support non-consecutive labels outside of [0, n) where appropriate #4478

Comments

beckernick commented Jan 12, 2022

github-actions bot commented Feb 11, 2022

jhancock1975 commented Mar 19, 2022 • edited Loading

beckernick commented Mar 21, 2022 • edited Loading

beckernick commented Mar 23, 2022

github-actions bot commented Apr 22, 2022

beckernick commented May 5, 2022

cjnolet commented May 5, 2022

beckernick commented May 5, 2022

github-actions bot commented Jul 3, 2022

jhancock1975 commented Mar 19, 2022 •

edited

Loading

beckernick commented Mar 21, 2022 •

edited

Loading