You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What is your question?
I'm trying to fit data to the cuml.ensemble.RandomForestClassifier and I keep getting the error: "The labels need to be consecutive values from 0 to the number of unique label values"
I'm passing cudf.DataFrame objects into the function which have the same number of rows but differing number of columns. The column labels start at 0 and step by 1 up to the final column (in the example below 108). What am I doing wrong? I've attached a printout of the dataframes that I'm passing in below and some code for context:
I've spent 14 hours trying different dataframes/Series, using pandas dataframes, numpy arrays, different data types, and I can't seem to get the RandomForestClassifier to fit. It always keeps coming back to: "The labels need to be consecutive values from 0 to the number of unique label values"
I've manually gone through and adjusted each label to a number, iterated a for loop over the labels to start at 0, and go to the max number of columns, I've saved everything to an excel sheet and triple checked that the labels are correct and that there's no missing data.
Any help would be appreciated.
mexicantexan
changed the title
[QST] How To Pass cuDF Dataframe to cuML.ensemble.RandomForestClassifier?
[BUG] How To Pass cuDF Dataframe to cuML.ensemble.RandomForestClassifier?
Jan 13, 2022
Your y column is not made up of consecutive values in [0, n), so you are hitting this bug: https://github.com//issues/4478 . You need to encode your y column to be in that range.
Starting from the example in that issue, you could do the following (as one example):
What is your question?
I'm trying to fit data to the cuml.ensemble.RandomForestClassifier and I keep getting the error: "The labels need to be consecutive values from 0 to the number of unique label values"
I'm passing cudf.DataFrame objects into the function which have the same number of rows but differing number of columns. The column labels start at 0 and step by 1 up to the final column (in the example below 108). What am I doing wrong? I've attached a printout of the dataframes that I'm passing in below and some code for context:
clf1 = modelClass(max_depth=D1, random_state=random.randrange(0, 1024, 1),
n_bins=15, n_streams=4, split_criterion=criterion, bootstrap=bootstrap, n_estimators=trs1)
clf1.fit(X1, Y1)
X1's dataframe looks like this:
[5407 rows x 109 columns]; dtype=('0', dtype('float64')); <cudf.core.dataframe._DataFrameLocIndexer object at 0x7f9c3d0f3070>
Y1's Dataframe looks like this:
[5407 rows x 1 columns]; dtype=('0', dtype('int32')); <cudf.core.dataframe._DataFrameLocIndexer object at 0x7f9c1b847b50>
System Information: Ubuntu 20.04, Titan RTX, CUDA 11.5, Rapids 21.12 built-in Conda, Python 3.8
The text was updated successfully, but these errors were encountered: