Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Logistic regression does not return fit status #2546

Closed
tfeher opened this issue Jul 11, 2020 · 4 comments · Fixed by #3515
Closed

[BUG] Logistic regression does not return fit status #2546

tfeher opened this issue Jul 11, 2020 · 4 comments · Fixed by #3515
Labels
bug Something isn't working

Comments

@tfeher
Copy link
Contributor

tfeher commented Jul 11, 2020

Describe the bug
Logistic regression uses the QN solver. The QN solver defines a set of return codes

enum OPT_RETCODE {
OPT_SUCCESS = 0,
OPT_NUMERIC_ERROR = 1,
OPT_LS_FAILED = 2,
OPT_MAX_ITERS_REACHED = 3,
OPT_INVALID_ARGS = 4
};

While qn_fit returns these codes to the caller, qnFit ignores the returned error code. This way the Python layer is not informed of the fit status.

If the solver exits after the first iteration with numerical error, the Python user is not informed about the error, and only sees insufficent accuracy of the model, like in this issue.

PR #2543 improved the C++ side logging to print a warning message if the solver is not converged, and error message if numerical error is found. But this might be hidden for user who is running a Jupyter notebook.

Steps/Code to reproduce bug

from sklearn.datasets import load_breast_cancer
from cuml.linear_model import LogisticRegression

X, y = load_breast_cancer(return_X_y=True)

cu_clf = LogisticRegression(max_iter=10)
cu_clf.fit(X, y.astype(float))

The C++ layer prints a warning message on the standard output: L-BFGS: max iterations reached, but no such message visible in a jupyter notebook.

In contrast Scikit learn

from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression as sk_log_reg

X, y = load_breast_cancer(return_X_y=True)

sk_clf = sk_log_reg(max_iter=10)
sk_clf.fit(X, y)

gives the following output in a jupyter notebook:

/home/tfeher/gpfs/anaconda3/envs/cuml_branch15a/lib/python3.7/site-packages/sklearn/linear_model/_logistic.py:764: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG)

Expected behavior
Give informative message about fit status for Jupyter notebook users.

@teju85
Copy link
Member

teju85 commented Jul 24, 2020

What if we added a _fit_status member variable inside LogisticRegression class? That can be used to programmatically know the result of the training.

@github-actions
Copy link

This issue has been marked rotten due to no recent activity in the past 90d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

@github-actions
Copy link

This issue has been marked stale due to no recent activity in the past 30d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be marked rotten if there is no activity in the next 60d.

@tfeher tfeher removed the ? - Needs Triage Need team to review and classify label Feb 18, 2021
@tfeher
Copy link
Contributor Author

tfeher commented Feb 18, 2021

The warning message is correctly displayed in the Jupyter notebook:

[W] [14:42:45.849061] L-BFGS: max iterations reached

This basically solves the main inssue here. We might want to add a few notes what to do in case the max iterations are reached, along the line what sklearn does

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html

I am preparing a small PR to implement this.

Since sklearn does not return a fit status, I think we can also skip it.

rapids-bot bot pushed a commit that referenced this issue Feb 19, 2021
closes #2546

This PR improves the warning message printed when max iterations are reached during fitting a linear model.

Example:
```python
import numpy as np
from cuml.linear_model import LogisticRegression
from sklearn.datasets import load_breast_cancer
X, y = load_breast_cancer(return_X_y=True)
y = y.astype(np.float64)
cls = LogisticRegression(penalty='none', C=1)
cls.fit(X, y)
```
This produces the following output, where the last line is added by this PR:
```
[W] [15:31:04.467478] L-BFGS: max iterations reached
[W] [15:31:04.467804] Maximum iterations reached before solver is converged. To increase model accuracy you can increase the number of iterations (max_iter) or improve the scaling of the input data.
```

Authors:
  - Tamas Bela Feher (@tfeher)

Approvers:
  - Dante Gama Dessavre (@dantegd)

URL: #3515
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants