-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor the backbone builders of detection #4656
Conversation
It seems that the test for If we move the extra_block initialization from after the backbone: backbone = resnet50(pretrained=pretrained_backbone, progress=progress, norm_layer=misc_nn_ops.FrozenBatchNorm2d)
backbone = _resnet_fpn_extractor(
backbone, trainable_backbone_layers, returned_layers=[2, 3, 4], extra_blocks=LastLevelP6P7(256, 256)
) To before, all tests work: extra_block = LastLevelP6P7(256, 256)
backbone = resnet50(pretrained=pretrained_backbone, progress=progress, norm_layer=misc_nn_ops.FrozenBatchNorm2d)
backbone = _resnet_fpn_extractor(
backbone, trainable_backbone_layers, returned_layers=[2, 3, 4], extra_blocks=extra_block
) So clearly this is a problem cause by the order of initialization given the seed, so we should be OK to replace the expected file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding some explanations to assist reviewing:
|
||
|
||
def _resnet_backbone_config( | ||
def _resnet_fpn_extractor( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename for consistency.
|
||
def _mobilenet_extractor( | ||
backbone: Union[mobilenet.MobileNetV2, mobilenet.MobileNetV3], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pass a full pre-initialized MobileNetV* backbone instead of a backbone_name.
@@ -385,7 +388,9 @@ def fasterrcnn_resnet50_fpn( | |||
if pretrained: | |||
# no need to download the backbone if pretrained is set | |||
pretrained_backbone = False | |||
backbone = resnet_fpn_backbone("resnet50", pretrained_backbone, trainable_layers=trainable_backbone_layers) | |||
|
|||
backbone = resnet50(pretrained=pretrained_backbone, progress=progress, norm_layer=misc_nn_ops.FrozenBatchNorm2d) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pretrained backbone is always initialized beforehand in the builder.
backbone = resnet_fpn_backbone("resnet50", pretrained_backbone, trainable_layers=trainable_backbone_layers) | ||
|
||
backbone = resnet50(pretrained=pretrained_backbone, progress=progress, norm_layer=misc_nn_ops.FrozenBatchNorm2d) | ||
backbone = _resnet_fpn_extractor(backbone, trainable_backbone_layers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then passed to the *_extractor method which builds the proper backbone.
extra_blocks=LastLevelP6P7(256, 256), | ||
trainable_layers=trainable_backbone_layers, | ||
backbone = _resnet_fpn_extractor( | ||
backbone, trainable_backbone_layers, returned_layers=[2, 3, 4], extra_blocks=LastLevelP6P7(256, 256) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As commented earlier, because the initialization of the backbone
happens now before the one of the LastLevelP6P7
, the expected value of the test needs to change. I tried moving the initialization before and it works with the previous expected file. So all good.
backbone = vgg.vgg16(pretrained=False, progress=progress) | ||
if pretrained_backbone: | ||
state_dict = load_state_dict_from_url(backbone_urls["vgg16_features"], progress=progress) | ||
backbone.load_state_dict(state_dict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By initializing the backbone on the builder, the complexity is reduced. Moreover, the multi-pretrained weights project will allow us to remove the if statement and just pass directly the exact weights we want to load to the vgg16.
@@ -23,7 +23,8 @@ | |||
backbone_urls = { | |||
# We port the features of a VGG16 backbone trained by amdegroot because unlike the one on TorchVision, it uses the | |||
# same input standardization method as the paper. Ref: https://s3.amazonaws.com/amdegroot-models/vgg16_reducedfc.pth | |||
"vgg16_features": "https://download.pytorch.org/models/vgg16_features-amdegroot.pth" | |||
# Only the `features` weights have proper values, those on the `classifier` module are filled with nans. | |||
"vgg16_features": "https://download.pytorch.org/models/vgg16_features-amdegroot-88682ab5.pth" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous weights only contained the values of vgg.features
. Now it contains the values of the entire vgg
model. Though the size increases, it allows us to load the weights with the standard mechanism instead of relying on hacks (this will become possible on the multi-pretrained weights PR).
@@ -183,7 +173,7 @@ def _mobilenet_extractor( | |||
for parameter in b.parameters(): | |||
parameter.requires_grad_(False) | |||
|
|||
return SSDLiteFeatureExtractorMobileNet(backbone, stage_indices[-2], norm_layer, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Safe. No kwargs were used.
@@ -235,14 +225,16 @@ def ssdlite320_mobilenet_v3_large( | |||
if norm_layer is None: | |||
norm_layer = partial(nn.BatchNorm2d, eps=0.001, momentum=0.03) | |||
|
|||
backbone = mobilenet.mobilenet_v3_large( | |||
pretrained=pretrained_backbone, progress=progress, norm_layer=norm_layer, reduced_tail=reduce_tail, **kwargs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reduced_tail
and kwargs
were supposed to be passed to the mobilenet_v3_large
builder.
) -> BackboneWithFPN: | ||
|
||
backbone = resnet.__dict__[backbone_name](weights=weights, norm_layer=norm_layer) | ||
return _resnet_backbone_config(backbone, trainable_layers, returned_layers, extra_blocks) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We no longer need to copy these files on the prototype.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @datumbox , LGTM! The many comments really help
different order of initialization
Do you think this is something worth mentioning in the future release notes, or nah?
@NicolasHug Thanks for the help!
Hm... I don't think we have done this before. This is probably because PyTorch doesn't guarantee getting the same random behaviour across versions. @fmassa any thoughts on this based on history? We have 3 options here:
I don't mind going for any of the above solutions. I'll merge now but happy to follow up with option 3 if needed. |
I've tested again all models and their metrics look fine. I'm merging this. |
Reviewed By: NicolasHug Differential Revision: D31916220 fbshipit-source-id: bc8316bab03ed73b32cc78ae520f32938d6746cb
* Refactoring resnet_fpn backbone building. * Passing the change to *_rcnn and retinanet. * Applying for faster_rcnn + mobilenetv3 * Applying for ssdlite + mobilenetv3 * Applying for ssd + vgg16 * Update the expected file of retinanet_resnet50_fpn to fix order of initialization. * Adding full model weights for the VGG16 features.
Fixes #4676
Implements some of the same ideas on the detection space. More specifically, we move the CNN backbone initialization on the model builder and pass it directly to the method that implements the feature extractor for the specific task.
This PR will enable the work on #4611. Changes are BC, though we are forced to update an expected file because of the different order of initialization. See #4656 (comment) for details.
Accuracy remains the same for all pre-trained models.