Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change optimizer parameters group method #1239

Conversation

hoonyyhoon
Copy link
Contributor

@hoonyyhoon hoonyyhoon commented Oct 29, 2020

What does this PR do?

This changes the current way of dividing model parameters into 3 optimizer parameter groups.
Since it divides parameters using names, there is a potential bug.
By changing names with hasattr, isinstance method, it can prevent potential mis-grouping due to naming and give a more generalized method for grouping.

Test

Simple test code to check code runs as same by appending module name into lists and comparing.
Tested on yolov5x.yaml, yolov5l.yaml, yolov5m.yaml, yolov5s.yaml

pg0, pg1, pg2 = [], [], []  # optimizer parameter groups
pg0_old, pg1_old, pg2_old = [], [], []

# Changes
for k, v in model.named_modules():
	if hasattr(v, 'bias') and isinstance(v.bias, torch.Tensor):
	    pg2.append(k+'.bias')
	if isinstance(v, nn.BatchNorm2d):
	    pg0.append(k+'.weight')
	elif hasattr(v, 'weight') and isinstance(v.weight, torch.Tensor):
	    pg1.append(k+'.weight')

# Current
for k, v in model.named_parameters():
	v.requires_grad = True
	if '.bias' in k:
	    pg2_old.append(k)  # biases
	elif '.weight' in k and '.bn' not in k:
	    pg1_old.append(k)  # apply weight decay
	else:
	    pg0_old.append(k)

# Test
assert pg0 == pg0_old
assert pg1 == pg1_old
assert pg2 == pg2_old

🛠️ PR Summary

Made with ❤️ by Ultralytics Actions

🌟 Summary

Refinement to layer freezing and parameter grouping in YOLOv5 training script. 🛠️

📊 Key Changes

  • Removed the previously unused freeze list containing parameter names to freeze.
  • Enabled training (requires_grad=True) for all layers by default.
  • Simplified the logic for freezing layers based on the freeze list.
  • Adjusted the optimizer parameter grouping to differentiate between biases, batch normalization weights, and other weights more accurately.

🎯 Purpose & Impact

  • The changes ensure all layers are trainable by default, offering flexibility to freeze specific layers if needed. 🗃️
  • The update provides clearer distinction among parameter groups, which may aid in the optimization process and allow more precise tuning of the training routine. ⚙️
  • Users may experience a more intuitive setup when customizing which parts of their model to train, potentially leading to improved model performance. 🎯

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @hoonyyhoon, thank you for submitting a PR! To allow your work to be integrated as seamlessly as possible, we advise you to:

  • Verify your PR is up-to-date with origin/master. If your PR is behind origin/master update by running the following, replacing 'feature' with the name of your local branch:
git remote add upstream https://github.com/ultralytics/yolov5.git
git fetch upstream
git checkout feature  # <----- replace 'feature' with local branch name
git rebase upstream/master
git push -u origin -f
  • Verify all Continuous Integration (CI) checks are passing.
  • Reduce changes to the absolute minimum required for your bug fix or feature addition. "It is not daily increase but daily decrease, hack away the unessential. The closer to the source, the less wastage there is." -Bruce Lee

@hoonyyhoon hoonyyhoon force-pushed the change_optimizer_param_group_method branch from 17e2707 to 777b1e1 Compare October 29, 2020 09:45
@glenn-jocher
Copy link
Member

@hoonyyhoon very interesting! Thanks for the PR.

@glenn-jocher glenn-jocher self-requested a review October 30, 2020 15:31
@glenn-jocher
Copy link
Member

@hoonyyhoon should torch.Tensor be replaced with nn.Parameter?

This might prevent non-gradient tensors from joining the param groups. There are examples of this in the Detect() layer, which holds the anchors as registered buffers, but not as parameters (they are not modified by loss).

@hoonyyhoon
Copy link
Contributor Author

hoonyyhoon commented Oct 31, 2020

@glenn-jocher
Here is simple print out for named_moudles.

for k, v in model.named_modules():
    print(f"{k} // {v}")
...
...
model.23.m.0 // Bottleneck(
  (cv1): Conv(
    (conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (bn): BatchNorm2d(256, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
    (act): Hardswish()
  )
  (cv2): Conv(
    (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (bn): BatchNorm2d(256, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
    (act): Hardswish()
  )
)
model.23.m.0.cv1 // Conv(
  (conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
  (bn): BatchNorm2d(256, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
  (act): Hardswish()
)
model.23.m.0.cv1.conv // Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
model.23.m.0.cv1.bn // BatchNorm2d(256, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.23.m.0.cv1.act // Hardswish()
model.23.m.0.cv2 // Conv(
  (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn): BatchNorm2d(256, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
  (act): Hardswish()
)
model.23.m.0.cv2.conv // Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
model.23.m.0.cv2.bn // BatchNorm2d(256, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.23.m.0.cv2.act // Hardswish()

## Detect module

model.24 // Detect(
  (m): ModuleList(
    (0): Conv2d(128, 255, kernel_size=(1, 1), stride=(1, 1))
    (1): Conv2d(256, 255, kernel_size=(1, 1), stride=(1, 1))
    (2): Conv2d(512, 255, kernel_size=(1, 1), stride=(1, 1))
  )
)
model.24.m // ModuleList(
  (0): Conv2d(128, 255, kernel_size=(1, 1), stride=(1, 1))
  (1): Conv2d(256, 255, kernel_size=(1, 1), stride=(1, 1))
  (2): Conv2d(512, 255, kernel_size=(1, 1), stride=(1, 1))
)
model.24.m.0 // Conv2d(128, 255, kernel_size=(1, 1), stride=(1, 1))
model.24.m.1 // Conv2d(256, 255, kernel_size=(1, 1), stride=(1, 1))
model.24.m.2 // Conv2d(512, 255, kernel_size=(1, 1), stride=(1, 1))

As you already know, it iterates module recursively.
So hasattr checks whether the module contains its parameter(ex: bias, weight) or not. And the reason why I inserted second condition(isinstance), is because some module holds variable bias or weight as None(ex: nn.Conv2d with bias=False).
Unless we name register_buffer as weights or bias, it wouldn't be a problem, and I thought this be a really rare case.

But, what you suggested seems safer to me as well.

So TL;DR, what you suggested seems better to me.
Thank you for reviewing!

@hoonyyhoon
Copy link
Contributor Author

Tested with assertion as follows.

pg0, pg1, pg2 = [], [], []  # optimizer parameter groups
for k, v in model.named_modules():
    if hasattr(v, 'bias'):
        assert isinstance(v.bias, nn.Parameter)==isinstance(v.bias, torch.Tensor)
    if isinstance(v, nn.BatchNorm2d):
        pg0.append(v.weight)
    elif hasattr(v, 'weight'):
        assert isinstance(v.weight, nn.Parameter)==isinstance(v.weight, torch.Tensor)
        pg1.append(v.weight)    # apply weight decay

@glenn-jocher
Copy link
Member

Adding code to resolve TODO #679 to this PR. We'll kill two birds with one stone.

@glenn-jocher glenn-jocher linked an issue Nov 1, 2020 that may be closed by this pull request
@glenn-jocher glenn-jocher added the enhancement New feature or request label Nov 1, 2020
@glenn-jocher
Copy link
Member

/rebase

@github-actions github-actions bot force-pushed the change_optimizer_param_group_method branch from b66ca24 to e0000f7 Compare November 1, 2020 22:36
@glenn-jocher
Copy link
Member

Changes look good, waiting on CI checks to merge.

@glenn-jocher
Copy link
Member

Bug fixed, checks passing, merging this PR now.

@hoonyyhoon thank you for your contributions! I think this update will provide for more robust parameter group settings going forward, and the code is more widely applicable now to future use cases. Nice job!

@glenn-jocher glenn-jocher merged commit 187f7c2 into ultralytics:master Nov 1, 2020
burglarhobbit pushed a commit to burglarhobbit/yolov5 that referenced this pull request Jan 1, 2021
* Change optimizer parameters group method

* Add torch nn

* Change isinstance method(torch.Tensor to nn.Parameter)

* parameter freeze fix, PEP8 reformat

* freeze bug fix

Co-authored-by: Glenn Jocher <[email protected]>
KMint1819 pushed a commit to KMint1819/yolov5 that referenced this pull request May 12, 2021
* Change optimizer parameters group method

* Add torch nn

* Change isinstance method(torch.Tensor to nn.Parameter)

* parameter freeze fix, PEP8 reformat

* freeze bug fix

Co-authored-by: Glenn Jocher <[email protected]>
BjarneKuehl pushed a commit to fhkiel-mlaip/yolov5 that referenced this pull request Aug 26, 2022
* Change optimizer parameters group method

* Add torch nn

* Change isinstance method(torch.Tensor to nn.Parameter)

* parameter freeze fix, PEP8 reformat

* freeze bug fix

Co-authored-by: Glenn Jocher <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Transfer Learning - Freezing Parameters
2 participants