[BUG] Logistic regression does not return fit status #2546

tfeher · 2020-07-11T17:26:32Z

Describe the bug
Logistic regression uses the QN solver. The QN solver defines a set of return codes

Lines 40 to 46 in 39e1bb2

    
           enum OPT_RETCODE { 
        
             OPT_SUCCESS = 0, 
        
             OPT_NUMERIC_ERROR = 1, 
        
             OPT_LS_FAILED = 2, 
        
             OPT_MAX_ITERS_REACHED = 3, 
        
             OPT_INVALID_ARGS = 4 
        
           };

While qn_fit returns these codes to the caller, qnFit ignores the returned error code. This way the Python layer is not informed of the fit status.

If the solver exits after the first iteration with numerical error, the Python user is not informed about the error, and only sees insufficent accuracy of the model, like in this issue.

PR #2543 improved the C++ side logging to print a warning message if the solver is not converged, and error message if numerical error is found. But this might be hidden for user who is running a Jupyter notebook.

Steps/Code to reproduce bug

from sklearn.datasets import load_breast_cancer
from cuml.linear_model import LogisticRegression

X, y = load_breast_cancer(return_X_y=True)

cu_clf = LogisticRegression(max_iter=10)
cu_clf.fit(X, y.astype(float))

The C++ layer prints a warning message on the standard output: L-BFGS: max iterations reached, but no such message visible in a jupyter notebook.

In contrast Scikit learn

from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression as sk_log_reg

X, y = load_breast_cancer(return_X_y=True)

sk_clf = sk_log_reg(max_iter=10)
sk_clf.fit(X, y)

gives the following output in a jupyter notebook:

/home/tfeher/gpfs/anaconda3/envs/cuml_branch15a/lib/python3.7/site-packages/sklearn/linear_model/_logistic.py:764: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG)

Expected behavior
Give informative message about fit status for Jupyter notebook users.

The text was updated successfully, but these errors were encountered:

teju85 · 2020-07-24T10:50:07Z

What if we added a _fit_status member variable inside LogisticRegression class? That can be used to programmatically know the result of the training.

github-actions · 2021-02-17T21:16:49Z

This issue has been marked rotten due to no recent activity in the past 90d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

github-actions · 2021-02-17T21:16:49Z

This issue has been marked stale due to no recent activity in the past 30d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be marked rotten if there is no activity in the next 60d.

tfeher · 2021-02-18T15:19:53Z

The warning message is correctly displayed in the Jupyter notebook:

[W] [14:42:45.849061] L-BFGS: max iterations reached

This basically solves the main inssue here. We might want to add a few notes what to do in case the max iterations are reached, along the line what sklearn does

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html

I am preparing a small PR to implement this.

Since sklearn does not return a fit status, I think we can also skip it.

@tfeher

closes #2546 This PR improves the warning message printed when max iterations are reached during fitting a linear model. Example: ```python import numpy as np from cuml.linear_model import LogisticRegression from sklearn.datasets import load_breast_cancer X, y = load_breast_cancer(return_X_y=True) y = y.astype(np.float64) cls = LogisticRegression(penalty='none', C=1) cls.fit(X, y) ``` This produces the following output, where the last line is added by this PR: ``` [W] [15:31:04.467478] L-BFGS: max iterations reached [W] [15:31:04.467804] Maximum iterations reached before solver is converged. To increase model accuracy you can increase the number of iterations (max_iter) or improve the scaling of the input data. ``` Authors: - Tamas Bela Feher (@tfeher) Approvers: - Dante Gama Dessavre (@dantegd) URL: #3515

tfeher added ? - Needs Triage Need team to review and classify bug Something isn't working labels Jul 11, 2020

tfeher mentioned this issue Jul 11, 2020

[BUG] LogisticRegression suffers from accuracy loss when penalty is enabled #2478

Closed

tfeher mentioned this issue Jul 23, 2020

Accuracy issues in Logistic Regression with L1 penalty #1293

Closed

github-actions bot added rotten labels Feb 17, 2021

tfeher removed the ? - Needs Triage Need team to review and classify label Feb 18, 2021

tfeher mentioned this issue Feb 18, 2021

[Review] Improve warning message when QN solver reaches max_iter #3515

Merged

rapids-bot bot closed this as completed in #3515 Feb 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Logistic regression does not return fit status #2546

[BUG] Logistic regression does not return fit status #2546

tfeher commented Jul 11, 2020

teju85 commented Jul 24, 2020

github-actions bot commented Feb 17, 2021

github-actions bot commented Feb 17, 2021

tfeher commented Feb 18, 2021

[BUG] Logistic regression does not return fit status #2546

[BUG] Logistic regression does not return fit status #2546

Comments

tfeher commented Jul 11, 2020

teju85 commented Jul 24, 2020

github-actions bot commented Feb 17, 2021

github-actions bot commented Feb 17, 2021

tfeher commented Feb 18, 2021