-
-
Notifications
You must be signed in to change notification settings - Fork 16.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change optimizer parameters group method #1239
Change optimizer parameters group method #1239
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @hoonyyhoon, thank you for submitting a PR! To allow your work to be integrated as seamlessly as possible, we advise you to:
- Verify your PR is up-to-date with origin/master. If your PR is behind origin/master update by running the following, replacing 'feature' with the name of your local branch:
git remote add upstream https://github.com/ultralytics/yolov5.git
git fetch upstream
git checkout feature # <----- replace 'feature' with local branch name
git rebase upstream/master
git push -u origin -f
- Verify all Continuous Integration (CI) checks are passing.
- Reduce changes to the absolute minimum required for your bug fix or feature addition. "It is not daily increase but daily decrease, hack away the unessential. The closer to the source, the less wastage there is." -Bruce Lee
17e2707
to
777b1e1
Compare
@hoonyyhoon very interesting! Thanks for the PR. |
@hoonyyhoon should This might prevent non-gradient tensors from joining the param groups. There are examples of this in the Detect() layer, which holds the anchors as registered buffers, but not as parameters (they are not modified by loss). |
@glenn-jocher for k, v in model.named_modules():
print(f"{k} // {v}") ...
...
model.23.m.0 // Bottleneck(
(cv1): Conv(
(conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(256, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
(act): Hardswish()
)
(cv2): Conv(
(conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn): BatchNorm2d(256, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
(act): Hardswish()
)
)
model.23.m.0.cv1 // Conv(
(conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(256, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
(act): Hardswish()
)
model.23.m.0.cv1.conv // Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
model.23.m.0.cv1.bn // BatchNorm2d(256, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.23.m.0.cv1.act // Hardswish()
model.23.m.0.cv2 // Conv(
(conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn): BatchNorm2d(256, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
(act): Hardswish()
)
model.23.m.0.cv2.conv // Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
model.23.m.0.cv2.bn // BatchNorm2d(256, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.23.m.0.cv2.act // Hardswish()
## Detect module
model.24 // Detect(
(m): ModuleList(
(0): Conv2d(128, 255, kernel_size=(1, 1), stride=(1, 1))
(1): Conv2d(256, 255, kernel_size=(1, 1), stride=(1, 1))
(2): Conv2d(512, 255, kernel_size=(1, 1), stride=(1, 1))
)
)
model.24.m // ModuleList(
(0): Conv2d(128, 255, kernel_size=(1, 1), stride=(1, 1))
(1): Conv2d(256, 255, kernel_size=(1, 1), stride=(1, 1))
(2): Conv2d(512, 255, kernel_size=(1, 1), stride=(1, 1))
)
model.24.m.0 // Conv2d(128, 255, kernel_size=(1, 1), stride=(1, 1))
model.24.m.1 // Conv2d(256, 255, kernel_size=(1, 1), stride=(1, 1))
model.24.m.2 // Conv2d(512, 255, kernel_size=(1, 1), stride=(1, 1)) As you already know, it iterates module recursively. But, what you suggested seems safer to me as well. So TL;DR, what you suggested seems better to me. |
Tested with assertion as follows. pg0, pg1, pg2 = [], [], [] # optimizer parameter groups
for k, v in model.named_modules():
if hasattr(v, 'bias'):
assert isinstance(v.bias, nn.Parameter)==isinstance(v.bias, torch.Tensor)
if isinstance(v, nn.BatchNorm2d):
pg0.append(v.weight)
elif hasattr(v, 'weight'):
assert isinstance(v.weight, nn.Parameter)==isinstance(v.weight, torch.Tensor)
pg1.append(v.weight) # apply weight decay |
Adding code to resolve TODO #679 to this PR. We'll kill two birds with one stone. |
/rebase |
b66ca24
to
e0000f7
Compare
Changes look good, waiting on CI checks to merge. |
Bug fixed, checks passing, merging this PR now. @hoonyyhoon thank you for your contributions! I think this update will provide for more robust parameter group settings going forward, and the code is more widely applicable now to future use cases. Nice job! |
* Change optimizer parameters group method * Add torch nn * Change isinstance method(torch.Tensor to nn.Parameter) * parameter freeze fix, PEP8 reformat * freeze bug fix Co-authored-by: Glenn Jocher <[email protected]>
* Change optimizer parameters group method * Add torch nn * Change isinstance method(torch.Tensor to nn.Parameter) * parameter freeze fix, PEP8 reformat * freeze bug fix Co-authored-by: Glenn Jocher <[email protected]>
* Change optimizer parameters group method * Add torch nn * Change isinstance method(torch.Tensor to nn.Parameter) * parameter freeze fix, PEP8 reformat * freeze bug fix Co-authored-by: Glenn Jocher <[email protected]>
What does this PR do?
This changes the current way of dividing model parameters into 3 optimizer parameter groups.
Since it divides parameters using names, there is a potential bug.
By changing names with hasattr, isinstance method, it can prevent potential mis-grouping due to naming and give a more generalized method for grouping.
Test
Simple test code to check code runs as same by appending module name into lists and comparing.
Tested on yolov5x.yaml, yolov5l.yaml, yolov5m.yaml, yolov5s.yaml
🛠️ PR Summary
Made with ❤️ by Ultralytics Actions
🌟 Summary
Refinement to layer freezing and parameter grouping in YOLOv5 training script. 🛠️
📊 Key Changes
freeze
list containing parameter names to freeze.freeze
list.🎯 Purpose & Impact