Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle binary classifier with all-0 labels #5810

Merged
merged 1 commit into from
Apr 17, 2024

Conversation

hcho3
Copy link
Contributor

@hcho3 hcho3 commented Mar 20, 2024

Bug: Currently, cuML RandomForestClassifier fails with error More than one variable expected for classification problem when the labels are all 0's. The error arises when the RandomForestClassifier object is translated into a Treelite object.

Minimal reproducer:

from cuml.ensemble import RandomForestClassifier
import numpy as np

X = np.array([[3, 2], [2, 3]], dtype=np.float32)
y = np.array([0, 0], dtype=np.int32)

clf = RandomForestClassifier(max_depth=1)
clf.fit(X, y)
clf.predict(X)   # fails here

Fix. When RandomForestClassifier produces a single output, zero-pad it to length 2 before passing it to Treelite.

@hcho3 hcho3 requested a review from a team as a code owner March 20, 2024 03:21
Copy link

copy-pr-bot bot commented Mar 20, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@hcho3 hcho3 force-pushed the fix_ident_labels branch 2 times, most recently from 94b4df9 to f4bdbcf Compare March 20, 2024 03:23
Copy link
Member

@dantegd dantegd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just have one minor comment

cpp/src/decisiontree/decisiontree.cuh Show resolved Hide resolved
@hcho3 hcho3 added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Mar 20, 2024
@hcho3 hcho3 changed the base branch from branch-24.04 to branch-24.06 April 16, 2024 22:00
@hcho3 hcho3 force-pushed the fix_ident_labels branch from ec26060 to 0f4d38c Compare April 16, 2024 22:01
@dantegd
Copy link
Member

dantegd commented Apr 16, 2024

/merge

@rapids-bot rapids-bot bot merged commit 654d95a into rapidsai:branch-24.06 Apr 17, 2024
64 checks passed
@hcho3 hcho3 deleted the fix_ident_labels branch April 17, 2024 03:34
jjacobelli pushed a commit to jjacobelli/cuml that referenced this pull request Apr 17, 2024
**Bug**: Currently, cuML RandomForestClassifier fails with error `More than one variable expected for classification problem` when the labels are all 0's. The error arises when the RandomForestClassifier object is translated into a Treelite object.

**Minimal reproducer**:

```python
from cuml.ensemble import RandomForestClassifier
import numpy as np

X = np.array([[3, 2], [2, 3]], dtype=np.float32)
y = np.array([0, 0], dtype=np.int32)

clf = RandomForestClassifier(max_depth=1)
clf.fit(X, y)
clf.predict(X)   # fails here
```

**Fix**. When RandomForestClassifier produces a single output, zero-pad it to length 2 before passing it to Treelite.

Authors:
  - Philip Hyunsu Cho (https://github.com/hcho3)

Approvers:
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#5810
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CUDA/C++ improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants