[Enhance] Learning rate in log can show the base learning rate of optimizer #1019

AkideLiu · 2023-03-26T05:07:37Z

Fix for issue #482

This is Akide. We are a course group (COMP SCI 4023 - Software Process Improvement) of five members from The University of Adelaide, trying to complete this issue and contribute. We will try to kick off this issue this week. If you have any more ideas, please don't hesitate to contact us.

Motivation

For example, the config of optim_wrapper is:

optim_wrapper = dict(
    type='OptimWrapper',
    optimizer=dict(type='AdamW', lr=0.0002, weight_decay=0.0001),
    clip_grad=dict(max_norm=0.1, norm_type=2),
    paramwise_cfg=dict(
        custom_keys={
            'backbone': dict(lr_mult=0.1),
            'sampling_offsets': dict(lr_mult=0.1),
            'reference_points': dict(lr_mult=0.1)
        }))

The log will record the learning rate of model.backbone(2e-5), rather than the base learning rate of optimizer(2e-4).

Modification

We have initiated an initial investigation and identified the following logics that may be related to the issue at hand:

Printing logs are primarily managed by runner.message_hub.update_scalar.
Logging scalars are exported to message_hub in RuntimeInfoHook::before_train_iter.
The learning rate is retrieved from runner.optim_wrapper.get_lr().
In OptimWrapper::get_lr, an unsorted list of learning rates is created from the optimizer parameter groups.

The core issue can potentially be resolved by sorting the learning rate list according to paramwise_cfg. At this stage, we are planning to create a preliminary pull request to sort the learning rate list without conditional checking. We believe it is essential to define rules to automatically identify the base learning rate in the optimizer parameter groups.

    def get_lr(self) -> Dict[str, List[float]]:
        """Get the learning rate of the optimizer.

        Provide unified interface to get learning rate of optimizer.

        Returns:
            Dict[str, List[float]]: Learning rate of the optimizer.
        """
        lr = [group['lr'] for group in self.param_groups]
        lr.sort(reverse=True)
        return dict(lr=lr)

The output shows that :

03/26 15:31:49 - mmengine - INFO - Checkpoints will be saved to /home/akide/pychram_remote/mmengine-dev/examples/work_dir.
03/26 15:31:50 - mmengine - INFO - Epoch(train) [1][  10/1563]  lr: 1.0000e-03  eta: 0:06:40  time: 0.1285  data_time: 0.0032  memory: 369  loss: 5.4107
03/26 15:31:50 - mmengine - INFO - Epoch(train) [1][  20/1563]  lr: 1.0000e-03  eta: 0:03:43  time: 0.0152  data_time: 0.0029  memory: 369  loss: 3.0671
03/26 15:31:51 - mmengine - INFO - Epoch(train) [1][  30/1563]  lr: 1.0000e-03  eta: 0:02:43  time: 0.0150  data_time: 0.0033  memory: 369  loss: 2.6957

Please let us know if you have any concerns or suggestions. We appreciate your input and look forward to collaborating on this matter.

BC-breaking (Optional)

Does the modification introduce changes that break the backward-compatibility of the downstream repos?
If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.

Use cases (Optional)

If this PR introduces a new feature, it is better to list some use cases here, and update the documentation.

Checklist

Pre-commit or other linting tools are used to fix the potential lint issues.
The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
If the modification has potential influence on downstream projects, this PR should be tested with downstream projects, like MMDet or MMCls.
The documentation has been modified accordingly, like docstring or example tutorials.

# |<---- Using a Maximum Of 50 Characters ---->| # Explain why this change is being made # |<---- Try To Limit Each Line to a Maximum Of 72 Characters ---->| # Provide links or keys to any relevant tickets, articles or other resources # Example: Github issue open-mmlab#23 # --- COMMIT END --- # Type can be # 🔬 exp (new experiment) # 🚀 feat (new feature) # 🧐 fix (bug fix) # 🏗️ refactor (refactoring production code) # 🍻 style (formatting, missing semi colons, etc; no code change) # 📝 docs (changes to documentation) # ✅ test (adding or refactoring tests; no production code change) # 💚 chore (updating grunt tasks etc; no production code change) # --------------------

CLAassistant · 2023-03-26T05:07:42Z

All committers have signed the CLA.

codecov · 2023-03-27T08:57:03Z

Codecov Report

Attention: Patch coverage is 66.66667% with 11 lines in your changes missing coverage. Please review.

Please upload report for BASE (main@eb2dc67). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
mmengine/optim/optimizer/optimizer_wrapper.py	70.00%	3 Missing and 3 partials ⚠️
mmengine/optim/optimizer/amp_optimizer_wrapper.py	0.00%	3 Missing ⚠️
mmengine/optim/optimizer/optimizer_wrapper_dict.py	50.00%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1019   +/-   ##
=======================================
  Coverage        ?   77.72%           
=======================================
  Files           ?      139           
  Lines           ?    11500           
  Branches        ?     2333           
=======================================
  Hits            ?     8938           
  Misses          ?     2151           
  Partials        ?      411

Flag	Coverage Δ
unittests	`77.72% <66.66%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

# |<---- Using a Maximum Of 50 Characters ---->| # Explain why this change is being made # |<---- Try To Limit Each Line to a Maximum Of 72 Characters ---->| # Provide links or keys to any relevant tickets, articles or other resources # Example: Github issue open-mmlab#23 # --- COMMIT END --- # Type can be # 🔬 exp (new experiment) # 🚀 feat (new feature) # 🧐 fix (bug fix) # 🏗️ refactor (refactoring production code) # 🍻 style (formatting, missing semi colons, etc; no code change) # 📝 docs (changes to documentation) # ✅ test (adding or refactoring tests; no production code change) # 💚 chore (updating grunt tasks etc; no production code change) # --------------------

AkideLiu · 2023-04-03T17:30:07Z

@HAOCHENYE , I have implemented an update to the optimizer parameter group to include a new parameter. However, it appears that this update has a significant impact on existing unit tests, particularly in the tests/test_optim/test_scheduler module. I am currently working on fixing these affected test cases. Could you please assist me with this task?

============================================= short test summary info =============================================
FAILED tests/test_optim/test_scheduler/test_param_scheduler.py::TestParameterSchedulerOptimWrapper::test_get_last_value - AssertionError: LR is wrong in epoch 0: expected 0.05, got 0.05
FAILED tests/test_optim/test_scheduler/test_param_scheduler.py::TestParameterSchedulerOptimWrapper::test_reduce_on_plateau_scheduler - ValueError: expected 3 min_lrs, got 2
============================= 2 failed, 202 passed, 8 skipped, 80 warnings in 11.48s ==============================

mmengine/optim/optimizer/optimizer_wrapper.py

tests/test_logging/test_message_hub.py

…e attribute of 'OptimWrapper' class.

…s' to the attribute of 'OptimWrapper' class.

AkideLiu · 2023-06-02T14:47:11Z

Hi @HAOCHENYE , we have applied a fix that passed all tests locally, could you please review this PR ?

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================ 912 passed, 88 skipped, 670 warnings in 155.40s (0:02:35) ============================

mmengine/optim/optimizer/optimizer_wrapper.py

AkideLiu · 2023-06-04T07:54:19Z

Hi @HAOCHENYE,

We have conducted distributed training to assess if there are any backward compatibility issues that may arise from this pull request (PR). Fortunately, we have found no breaking changes that would affect the downstream repository. We tested the current PR using MMseg with 4 GPUs, and the attached logs provide more details about the testing process.

20230604_171640.log

AkideLiu · 2023-06-06T08:53:57Z

Hi @HAOCHENYE , i have merged your refine branch into this pr, please let me know anything else we can improve.

Co-authored-by: Zaida Zhou <[email protected]>

zhouzaida · 2023-06-08T03:26:01Z

Hi @AkideLiu , should we also update the logic to get lr in

mmengine/mmengine/optim/optimizer/optimizer_wrapper_dict.py

Line 118 in eb2dc67

lr_dict[f'{name}.lr'] = optim_wrapper.get_lr()['lr']

mmengine/optim/scheduler/param_scheduler.py

AkideLiu · 2023-06-08T07:45:12Z

@zhouzaida sure, i have update the logic of get lr in optimizer_wrapper_dict

mmengine/optim/optimizer/optimizer_wrapper_dict.py

Co-authored-by: Zaida Zhou <[email protected]>

tests/test_optim/test_scheduler/test_param_scheduler.py

Co-authored-by: Zaida Zhou <[email protected]>

mmengine/optim/optimizer/optimizer_wrapper.py

zhouzaida · 2023-06-08T11:41:13Z

Hi @AkideLiu , thanks for your contribution. This PR will be merged after passing the CI.

tests/test_optim/test_optimizer/test_optimizer_wrapper.py

AkideLiu · 2023-06-08T11:54:45Z

Hi @HAOCHENYE @zhouzaida , We appreciate your efforts and support during the development of this PR, hope mmengine project has a great future.

AkideLiu requested a review from HAOCHENYE as a code owner March 26, 2023 05:07

AkideLiu requested a review from zhouzaida as a code owner April 3, 2023 17:26

🧐 fix : pre commit

d971624

HAOCHENYE reviewed Apr 4, 2023

View reviewed changes

mmengine/optim/optimizer/optimizer_wrapper.py Outdated Show resolved Hide resolved

mmengine/optim/optimizer/optimizer_wrapper.py Outdated Show resolved Hide resolved

tests/test_logging/test_message_hub.py Outdated Show resolved Hide resolved

s1eeveW and others added 9 commits May 21, 2023 04:16

open-mmlab#482

1d0a8ed

Merge remote-tracking branch 'origin/main' into fix/482

77ddac8

Revert teh optimizer_wrapper.py file.

32887f0

Try to solve the conflicts and convert the 'new_param_settings' to th…

3e1f421

…e attribute of 'OptimWrapper' class.

Try to solve the testing conflicts and convert the 'new_param_setting…

4840dff

…s' to the attribute of 'OptimWrapper' class.

Clean the code and re-commit the files.

d074da0

Merge branch 'open-mmlab:main' into main

6796925

Merge remote-tracking branch 'origin/main' into fix/482

431b5be

🧐 fix : fix CI

6f3691d

AkideLiu requested a review from RangiLyu as a code owner June 2, 2023 14:39

🚀 feat : rewrite logic, do not append tracker to torch optimizer

1d6522a

HAOCHENYE reviewed Jun 4, 2023

View reviewed changes

mmengine/optim/optimizer/optimizer_wrapper.py Outdated Show resolved Hide resolved

🚀 feat : add support for base_lr

83e7135

HAOCHENYE and others added 4 commits June 5, 2023 22:08

minor refine

16e2943

Merge branch 'fix/482' into 1019_proposal

a27f4ee

Merge pull request #1 from HAOCHENYE/1019_proposal

7f1fcee

🧐 fix : fix test and remove some useless section

324e458

AkideLiu requested a review from HAOCHENYE June 5, 2023 15:36

AkideLiu and others added 3 commits June 8, 2023 12:12

Update mmengine/optim/optimizer/optimizer_wrapper.py

d672429

Co-authored-by: Zaida Zhou <[email protected]>

🚀 feat : refinement

8f2bcaa

🧐 fix : fix unit test and remove list containing

f5ddd7c