[BUG] How To Pass cuDF Dataframe to cuML.ensemble.RandomForestClassifier? #4480

mexicantexan · 2022-01-12T16:11:26Z

What is your question?
I'm trying to fit data to the cuml.ensemble.RandomForestClassifier and I keep getting the error: "The labels need to be consecutive values from 0 to the number of unique label values"

I'm passing cudf.DataFrame objects into the function which have the same number of rows but differing number of columns. The column labels start at 0 and step by 1 up to the final column (in the example below 108). What am I doing wrong? I've attached a printout of the dataframes that I'm passing in below and some code for context:

clf1 = modelClass(max_depth=D1, random_state=random.randrange(0, 1024, 1), n_bins=15, n_streams=4, split_criterion=criterion, bootstrap=bootstrap, n_estimators=trs1)
clf1.fit(X1, Y1)
X1's dataframe looks like this:

	0	1	2	...	107	108
0	1.000000e-11	1.000000e-11	1.647421e-01	...	1.000000e-11	1.647421e-01
1	1.000000e-11	1.000000e-11	1.760000e-02	...	1.000000e-11	1.760000e-02
2	1.000000e-11	1.000000e-11	-1.772000e-01	...	1.000000e-11	-1.772000e-01
3	1.000000e-11	1.000000e-11	8.254000e-01	...	1.000000e-11	8.254000e-01
4	1.000000e-11	1.000000e-11	2.587000e-01	...	1.000000e-11	2.587000e-01
...	...	...	...	...	...	...
5402	1.000000e-11	1.000000e-11	1.704444e-01	...	1.000000e-11	1.704444e-01
5403	1.000000e-11	1.000000e-11	-1.860000e-01	...	1.000000e-11	-1.860000e-01
5404	0.000000e+00	1.000000e-11	1.229714e-01	...	1.000000e-11	1.229714e-01
5405	1.000000e-11	1.959500e-01	1.984667e-01	...	1.959500e-01	1.984667e-01
5406	1.000000e-11	1.000000e-11	1.000000e-11	...	1.000000e-11	1.000000e-11

[5407 rows x 109 columns]; dtype=('0', dtype('float64')); <cudf.core.dataframe._DataFrameLocIndexer object at 0x7f9c3d0f3070>

Y1's Dataframe looks like this:

	0
0	-2
1	4
2	-3
3	1
4	0
...	...
5402	0
5403	-2
5404	0
5405	0
5406	0

[5407 rows x 1 columns]; dtype=('0', dtype('int32')); <cudf.core.dataframe._DataFrameLocIndexer object at 0x7f9c1b847b50>

System Information: Ubuntu 20.04, Titan RTX, CUDA 11.5, Rapids 21.12 built-in Conda, Python 3.8

The text was updated successfully, but these errors were encountered:

mexicantexan · 2022-01-13T15:28:01Z

I've spent 14 hours trying different dataframes/Series, using pandas dataframes, numpy arrays, different data types, and I can't seem to get the RandomForestClassifier to fit. It always keeps coming back to: "The labels need to be consecutive values from 0 to the number of unique label values"

I've manually gone through and adjusted each label to a number, iterated a for loop over the labels to start at 0, and go to the max number of columns, I've saved everything to an excel sheet and triple checked that the labels are correct and that there's no missing data.

Any help would be appreciated.

beckernick · 2022-01-13T16:20:39Z

Your y column is not made up of consecutive values in [0, n), so you are hitting this bug: https://github.com//issues/4478 . You need to encode your y column to be in that range.

Starting from the example in that issue, you could do the following (as one example):

import cudf
import cuml

df = cudf.DataFrame({
    "x1": [0.0,1,2],
    "x2": [-3,2.0,5],
    "y": [-3, 0, 4.0]
})

enc = cuml.preprocessing.LabelEncoder()
df["y_consecutive"] = enc.fit_transform(df.y)

print(clf.fit(df[["x1", "x2"]], df["y_consecutive"]))
RandomForestClassifier()

I would recommend we close this issue and continue discussion of the bug in the linked issue

mexicantexan added ? - Needs Triage Need team to review and classify question Further information is requested labels Jan 12, 2022

mexicantexan changed the title ~~[QST] How To Pass cuDF Dataframe to cuML.ensemble.RandomForestClassifier?~~ [BUG] How To Pass cuDF Dataframe to cuML.ensemble.RandomForestClassifier? Jan 13, 2022

mexicantexan closed this as completed Jan 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] How To Pass cuDF Dataframe to cuML.ensemble.RandomForestClassifier? #4480

[BUG] How To Pass cuDF Dataframe to cuML.ensemble.RandomForestClassifier? #4480

mexicantexan commented Jan 12, 2022 •

edited

Loading

mexicantexan commented Jan 13, 2022

beckernick commented Jan 13, 2022

[BUG] How To Pass cuDF Dataframe to cuML.ensemble.RandomForestClassifier? #4480

[BUG] How To Pass cuDF Dataframe to cuML.ensemble.RandomForestClassifier? #4480

Comments

mexicantexan commented Jan 12, 2022 • edited Loading

mexicantexan commented Jan 13, 2022

beckernick commented Jan 13, 2022

mexicantexan commented Jan 12, 2022 •

edited

Loading