Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tolerate QN linesearch failures when it's harmless #3791

Merged

Conversation

achirkin
Copy link
Contributor

@achirkin achirkin commented Apr 23, 2021

In some cases, the linesearch subroutine of the Quasi-Newton solver (logistic regression) may fail in a way that is tolerable for the solver. Currently, any linesearch failure makes the outer algorithm stop, and most of the time the found solution seems to be acceptable to a good precision. This proposal suggests to ignore some non-critical failures ( LS_MAX_ITERS_REACHED and LS_INVALID_STEP_MIN) and stop peacefully if the convergence check triggers or if the function change is too small.

The PR mostly changes the logging behavior, with one small exception. The last condition in update_and_check won't trigger if the linesearch error is not critical and if the target function goes down (i.e. the solver still advances). Previously, the solver would stop independently of the target change.

@achirkin achirkin added 2 - In Progress Currenty a work in progress bug Something isn't working non-breaking Non-breaking change labels Apr 23, 2021
@achirkin achirkin requested a review from tfeher April 23, 2021 16:16
@achirkin achirkin changed the title [WIP] Tolerate QN linesearch failures when it's harmless Tolerate QN linesearch failures when it's harmless Apr 26, 2021
@achirkin achirkin marked this pull request as ready for review April 26, 2021 13:58
@achirkin achirkin requested a review from a team as a code owner April 26, 2021 13:58
@achirkin achirkin added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currenty a work in progress labels Apr 26, 2021
@achirkin
Copy link
Contributor Author

achirkin commented Apr 26, 2021

Note, this PR makes more sense with #3777 , which sets a reasonable default for param.delta. And it ought to be the last PR to close #3645 (I'll check and close it manually if that's the case).

@cjnolet cjnolet self-assigned this Apr 26, 2021
Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the theory behind this approach sounds fine to me- though it might lead to a marginal decrease in perf, it seems like it should provide a better solution in these cases. I just have one little nitpick and a question.

cpp/src/glm/qn/qn_solvers.cuh Outdated Show resolved Hide resolved
cpp/src/glm/qn/qn_solvers.cuh Outdated Show resolved Hide resolved
@achirkin achirkin marked this pull request as draft April 27, 2021 09:18
@achirkin achirkin added 2 - In Progress Currenty a work in progress and removed 3 - Ready for Review Ready for review by team labels Apr 27, 2021
@achirkin achirkin force-pushed the bug-ext-qn-linesearch-failure branch from 580cf77 to 687eb35 Compare April 27, 2021 10:06
@achirkin
Copy link
Contributor Author

NB: sometimes, linesearch fails with LS_MAX_ITERS_REACHED while increasing the loss by a small value. Normally, this indicates that the solver has converged, but it could also mean there is a bug in the solver somewhere. In either case we stop the solver, but I am not sure whether we should report that as info, warning, or error.

@dantegd dantegd added 4 - Waiting on Reviewer Waiting for reviewer to review or respond and removed 2 - In Progress Currenty a work in progress labels May 14, 2021
@dantegd
Copy link
Member

dantegd commented May 14, 2021

@achirkin I was wondering what's the status of this PR? It seems to me that it has addressed @cjnolet's comment, if so is it waiting for a re-review?

@cjnolet
Copy link
Member

cjnolet commented May 14, 2021

@dantegd this PR was converted back to a draft after my review to I was under the impression @achirkin was still ironing some things out. @achirkin, please let me know if this is ready and I can re-review.

@cjnolet cjnolet added 4 - Waiting on Author Waiting for author to respond to review and removed 4 - Waiting on Reviewer Waiting for reviewer to review or respond labels May 14, 2021
@achirkin
Copy link
Contributor Author

Sorry, I should have changed the labels. The solution I propose here is not perfect (it would still report an error when the target seems to grow due to numerical error), and the problem itself is not really pressing (wrong errors in logs). Thus, after a short discussion with @tfeher, we've decided to postpone this until we get a better idea on how to address it.

@github-actions github-actions bot added conda conda issue Cython / Python Cython or Python issue gpuCI gpuCI issue labels Jul 21, 2021
@achirkin achirkin changed the base branch from branch-21.06 to branch-21.08 July 21, 2021 07:46
@github-actions github-actions bot removed CMake gpuCI gpuCI issue conda conda issue Cython / Python Cython or Python issue labels Jul 21, 2021
@achirkin achirkin added 3 - Ready for Review Ready for review by team and removed 4 - Waiting on Author Waiting for author to respond to review labels Jul 21, 2021
@achirkin achirkin marked this pull request as ready for review July 21, 2021 12:43
@achirkin achirkin requested a review from cjnolet July 21, 2021 12:45
@achirkin
Copy link
Contributor Author

Update in the logic:

  1. If the lineseach fails, report this as a warning rather than an error.
  2. Don't report it at all, if convergence check succeeds.

Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Artem for this PR! Please update the PR description to state under which circumstances the stopping condition could differ from the earlier version. Otherwise it looks good.

cpp/src/glm/qn/qn_solvers.cuh Show resolved Hide resolved
cpp/src/glm/qn/qn_solvers.cuh Outdated Show resolved Hide resolved
@achirkin achirkin requested a review from tfeher July 22, 2021 10:48
@codecov-commenter
Copy link

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.08@5b36ced). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff               @@
##             branch-21.08    #3791   +/-   ##
===============================================
  Coverage                ?   85.72%           
===============================================
  Files                   ?      230           
  Lines                   ?    18191           
  Branches                ?        0           
===============================================
  Hits                    ?    15595           
  Misses                  ?     2596           
  Partials                ?        0           
Flag Coverage Δ
dask 48.20% <0.00%> (?)
non-dask 78.16% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5b36ced...c9aaab2. Read the comment docs.

Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Artem for the update, the PR looks good to me.

Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks Artem!

@cjnolet
Copy link
Member

cjnolet commented Jul 22, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 0a6c5ba into rapidsai:branch-21.08 Jul 22, 2021
vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this pull request Oct 9, 2023
In some cases, the linesearch subroutine of the Quasi-Newton solver (logistic regression) may fail in a way that is tolerable for the solver. Currently, any linesearch failure makes the outer algorithm stop, and most of the time the found solution seems to be acceptable to a good precision. This proposal suggests to ignore some non-critical failures ( `LS_MAX_ITERS_REACHED` and `LS_INVALID_STEP_MIN`) and stop peacefully if the convergence check triggers or if the function change is too small.

The PR mostly changes the logging behavior, with one small exception. The last condition in `update_and_check` won't trigger if the linesearch error is not critical and if the target function goes down (i.e. the solver still advances). Previously, the solver would stop independently of the target change.

Authors:
  - Artem M. Chirkin (https://github.com/achirkin)

Approvers:
  - Tamas Bela Feher (https://github.com/tfeher)
  - Corey J. Nolet (https://github.com/cjnolet)

URL: rapidsai#3791
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team bug Something isn't working CUDA/C++ non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants