Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QN solvers: Use different gradient norms for different for different loss functions. #4491

Merged
merged 5 commits into from
Feb 10, 2022

Conversation

achirkin
Copy link
Contributor

@achirkin achirkin commented Jan 17, 2022

Different loss functions may scale differently with the number of features. This has an effect on the convergence criteria. To account for that, I let a loss function define its preferred metric. As a result, the number of iterations should be less dependent on the number of features for all loss functions.

@achirkin achirkin added 2 - In Progress Currenty a work in progress non-breaking Non-breaking change improvement Improvement / enhancement to an existing function labels Jan 17, 2022
@achirkin achirkin marked this pull request as ready for review January 18, 2022 11:14
@achirkin achirkin requested a review from a team as a code owner January 18, 2022 11:14
@achirkin achirkin requested a review from tfeher January 18, 2022 11:14
@achirkin achirkin added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currenty a work in progress labels Jan 18, 2022
@achirkin achirkin changed the title Use different gradient norms for different for different loss functions. QN solvers: Use different gradient norms for different for different loss functions. Jan 18, 2022
@codecov-commenter
Copy link

Codecov Report

❗ No coverage uploaded for pull request base (branch-22.02@d0cbcd9). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff               @@
##             branch-22.02    #4491   +/-   ##
===============================================
  Coverage                ?   85.77%           
===============================================
  Files                   ?      236           
  Lines                   ?    19314           
  Branches                ?        0           
===============================================
  Hits                    ?    16567           
  Misses                  ?     2747           
  Partials                ?        0           
Flag Coverage Δ
dask 46.54% <0.00%> (?)
non-dask 78.67% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d0cbcd9...46bd852. Read the comment docs.

@cjnolet
Copy link
Member

cjnolet commented Jan 19, 2022

rerun tests

1 similar comment
@achirkin
Copy link
Contributor Author

rerun tests

@achirkin achirkin changed the base branch from branch-22.02 to branch-22.04 January 24, 2022 14:41
Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The changes are straightforward.

@achirkin
Copy link
Contributor Author

rerun tests

@achirkin
Copy link
Contributor Author

achirkin commented Feb 3, 2022

rerun tests

1 similar comment
@cjnolet
Copy link
Member

cjnolet commented Feb 3, 2022

rerun tests

Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Artem for the PR, it looks good to me.

@achirkin
Copy link
Contributor Author

achirkin commented Feb 8, 2022

rerun tests

@cjnolet
Copy link
Member

cjnolet commented Feb 10, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 06b1f92 into rapidsai:branch-22.04 Feb 10, 2022
vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this pull request Oct 9, 2023
…loss functions. (rapidsai#4491)

Different loss functions may scale differently with the number of features. This has an effect on the convergence criteria. To account for that, I let a loss function define its preferred metric. As a result, the number of iterations should be less dependent on the number of features for all loss functions.

Authors:
  - Artem M. Chirkin (https://github.com/achirkin)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)
  - Tamas Bela Feher (https://github.com/tfeher)

URL: rapidsai#4491
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team CUDA/C++ improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants