Add RegNet Architecture in TorchVision #4403

kazhang · 2021-09-14T06:05:09Z

Resolves #2655

kazhang · 2021-09-15T22:19:09Z

torchvision/models/regnet.py

+                          use_se=False, **kwargs)
+    return _regnet("regnet_x_32gf", params, pretrained, progress, **kwargs)
+
+# TODO(kazhang): Add RegNetZ_500MF and RegNetZ_4GF


RegNetZ from paper Fast and Accurate Model Scaling and are available in Classy.
We would need ConvBNReLU before the last FC layer. I will address this in the next commit.

kazhang · 2021-09-15T22:22:15Z

Publish to gather early feedback.
cc @fmassa @datumbox @mannatsingh

datumbox

@kazhang thanks for the contribution!

You didn't specify if you wanted us to focus on the ML side or on the architectural side of the code. Given that in the past you have implemented this model on ClassyVision, below I focus only on high-level comments about the architecture. Hope it's OK.

My comments below can be grouped in 3 categories:

I highlight whenever you introduce a block for which we have one or many copies. This is in order to discuss whether we want to share an existing module. In the past we favoured keeping the models with minimum inheritance so that people can copy-paste them and edit them easily, but this does not scale very well.
I let you know when you use a different pattern/idiom than the one usually found in vision. All patterns have pros/cons, your usages are valid and tastes are highly subjective. So I think this will boil down to the question "Do we want to keep the entire codebase consistent?"
General questions / clarifications.

Happy to provide more detailed input once we discuss these. Let me know your thoughts, happy to jump in a call to speed this up as well and later summarize in this PR.

torchvision/models/regnet.py

fmassa

Thanks a ton for the PR, this is looking good!

I did an initial pass and I have a few high-level comments.

Maybe we could have a joint chat with Vasilis to try and converge on some specific points brought by our comments, what do you think?

torchvision/models/regnet.py

mannatsingh

High level comment for the group - do we want to support the following -

Fast and Accurate Model Scaling proposes a way to scale models. Once a model is scaled, it cannot be defined by w_a, w_m, etc. anymore and instead needs to be specified by the stage widths, group widths, etc. which means the RegNetParams API does not suffice. We already have models like these being trained currently with multiple downstream applications (happy to share details internally). In Classy Vision we had an AnyNetParams API for this (https://github.com/facebookresearch/ClassyVision/blob/b9ea86b39cc469867aabb76ada33bf5a8530eb19/classy_vision/models/anynet.py#L364), which RegNetParams derived from.
The Designing Network Design Spaces paper specified AnyNets as fairly general networks and RegNetX, RegNetY, RegNetZ are instantiations of these. Which is why we have multiple block types - a user can specify very general networks with these settings.

IMO at the very least we should support instantiating scaled models. Supporting general AnyNets might be out of scope for torchvision, I'm not sure (they're quite useful too). Answering these two questions is important to answer some of the comments in code currently!

cc @pdollar

torchvision/models/regnet.py

kazhang · 2021-09-21T22:26:37Z

High level comment for the group - do we want to support the following -

* [Fast and Accurate Model Scaling](https://arxiv.org/abs/2103.06877) proposes a way to scale models. Once a model is scaled, it cannot be defined by `w_a`, `w_m`, etc. anymore and instead needs to be specified by the stage widths, group widths, etc. which means the `RegNetParams` API does not suffice. We already have models like these being trained currently with multiple downstream applications (happy to share details internally). In Classy Vision we had an `AnyNetParams` API for this (https://github.com/facebookresearch/ClassyVision/blob/b9ea86b39cc469867aabb76ada33bf5a8530eb19/classy_vision/models/anynet.py#L364), which `RegNetParams` derived from.

* The [Designing Network Design Spaces](https://arxiv.org/abs/2003.13678) paper specified AnyNets as fairly general networks and `RegNetX`, `RegNetY`, `RegNetZ` are instantiations of these. Which is why we have multiple block types - a user can specify very general networks with these settings.

We could support both scaled and regular regnet model if we init the RegNetParams with lists of params and have a class method to compute list of params.

Regarding to various block type, I'm leaning towards no including unused block types. From engineering point of view, if a component is not actively used within the code base, it is likely to be ignored in regular maintenance and eventually abandoned. Additionally, it's much easier to add a component than deleting it in a BC manner.

datumbox

Thanks for the updates @kazhang. I think we are getting very close, I really like the changes.

I left couple of comments, feel free to ignore the nits as we can address them optionally on a separate PR. Let me know what you think.

torchvision/models/regnet.py

torchvision/models/efficientnet.py

torchvision/models/regnet.py

mannatsingh · 2021-09-22T17:30:45Z

We could support both scaled and regular regnet model if we init the RegNetParams with lists of params and have a class method to compute list of params.

Regarding to various block type, I'm leaning towards no including unused block types. From engineering point of view, if a component is not actively used within the code base, it is likely to be ignored in regular maintenance and eventually abandoned. Additionally, it's much easier to add a component than deleting it in a BC manner.

Agree on both accounts! Thanks Kai!

datumbox

LGTM, thanks @kazhang.

The comments I left are optional nits or discussion points, none of them are blocking. IMO you should be able to merge this once you add the weights.

What's the plan for the pre-trained models? There are lots of new variants being added, do you foresee having weights for all of them prior the release? Alternatively, do you propose releasing the methods without the weights (it's not common but it has been done before)? Another option is to offer builder methods only for models that have weights.

datumbox · 2021-09-23T08:41:34Z

torchvision/models/regnet.py

+        if block_type is None:
+            block_type = ResBottleneckBlock
+        if activation is None:
+            activation = nn.ReLU


I agree with @kazhang's earlier comment that this looks unnecessarily busy. It's true that in most places in vision, the idiom used is to pass None for the default layer and then assign a value. That seems not great IMO.

@fmassa any specific historical reason why we did this? Any concerns adopting Kai's proposal and do:

stem_type: Callable[..., nn.Module] = SimpleStemIN, block_type: Callable[..., nn.Module] = ResBottleneckBlock, norm_layer: Callable[..., nn.Module] = nn.BatchNorm2d, activation: Callable[..., nn.Module] = nn.ReLU,

datumbox · 2021-09-23T08:43:47Z

torchvision/models/regnet.py

+
+            current_width = width_out
+
+        self.trunk_output = nn.Sequential(OrderedDict(blocks))


Nit: In other models this is called features

datumbox · 2021-09-23T08:46:06Z

torchvision/models/regnet.py

+        self.trunk_output = nn.Sequential(OrderedDict(blocks))
+
+        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
+        self.fc = nn.Linear(in_features=current_width, out_features=num_classes)


Nit: If some models (not all), we have a classifier attribute (see here).

datumbox · 2021-09-27T08:03:50Z

docs/source/models.rst

+    regnet_x_3_2gf = models.regnet_x_3_2gf(pretrained=True)
+    regnet_x_8gf = models.regnet_x_8gf(pretrained=True)
+    regnet_x_16gf = models.regnet_x_16gf(pretrained=True)
+    regnet_x_32gf = models.regnet_x_32gf(pretrained=True)


Before merge remember to remove from this list all models for which we don't offer weights. :)

This reverts commit 850f5f3.

datumbox · 2021-09-29T09:51:24Z

@kazhang I did a final check and the PR looks good to go for me. We can follow up for the nits and todos on another PR.

Two quick questions:

Your implementation contains classes such as SimpleStemIN, BottleneckTransform etc which I understand are not declared on the __all__ variable. This is a very common pattern in TorchVision BTW. I assume that you consider these classes not part of the public API but still made it not fully private (by prepending an _) because other developers might be interested in using them. Can you confirm this is the case?
Have you validated all deployed pre-trained weights by running the references script with the --test-only --pretrained arguments? This is a final step we usually do just prior merging to confirm the accuracies reported on docs are accurate and that we are good to go. See the bottom of the description of PR Add MobileNetV3 Architecture in TorchVision #3182 for an example.

kazhang · 2021-09-29T17:16:52Z

@datumbox yes, SimpleStemIN, BottleneckTransform are common building blocks in RegNet, other developers might want to use them when exploring other variants of RegNet.

kazhang · 2021-09-29T17:24:59Z

I've tested the pretrained model on ImageNet:

model	Acc@1	Paper Acc@1	Acc@5	command
regnet_y_400mf	74.024	74.1	91.68	srun -p $TRAIN_QUEUE --cpus-per-task=96 --gpus-per-node=8 python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py --model regnet_y_400mf --batch-size 128 --test-only --pretrained
regnet_y_800mf	76.42	76.3	93.136	srun -p $TRAIN_QUEUE --cpus-per-task=96 --gpus-per-node=8 python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py --model regnet_y_800mf --batch-size 128 --test-only --pretrained
regnet_y_8gf	79.966	79.9	95.1	srun -p $TRAIN_QUEUE --cpus-per-task=96 --gpus-per-node=8 python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py --model regnet_y_8gf --batch-size 64 --test-only --pretrained
regnet_x_400mf	72.83	72.7	90.95	srun -p $TRAIN_QUEUE --cpus-per-task=96 --gpus-per-node=8 python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py --model regnet_x_400mf --batch-size 128 --test-only --pretrained
regnet_x_800mf	75.19	75.2	92.418	srun -p $TRAIN_QUEUE --cpus-per-task=96 --gpus-per-node=8 python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py --model regnet_x_800mf --batch-size 128 --test-only --pretrained
regnet_x_8gf	79.324	79.3	94.694	srun -p $TRAIN_QUEUE --cpus-per-task=96 --gpus-per-node=8 python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py --model regnet_y_8gf --batch-size 64 --test-only --pretrained

github-actions · 2021-09-29T19:05:51Z

Hey @kazhang!

You merged this PR, but no labels were added.

Summary: * initial code * add SqueezeExcitation * initial code * add SqueezeExcitation * add SqueezeExcitation * regnet blocks, stems and model definition * nit * add fc layer * use Callable instead of Enum for block, stem and activation * add regnet_x and regnet_y model build functions, add docs * remove unused depth * use BN/activation constructor and ConvBNActivation * add expected test pkl files * allow custom activation in SqueezeExcitation * use ReLU as the default activation * initial code * add SqueezeExcitation * initial code * add SqueezeExcitation * add SqueezeExcitation * regnet blocks, stems and model definition * nit * add fc layer * use Callable instead of Enum for block, stem and activation * add regnet_x and regnet_y model build functions, add docs * remove unused depth * use BN/activation constructor and ConvBNActivation * reuse SqueezeExcitation from efficientnet * refactor RegNetParams into BlockParams * use nn.init, replace np with torch * update README * construct model with stem, block, classifier instances * Revert "construct model with stem, block, classifier instances" This reverts commit 850f5f3. * remove unused blocks * support scaled model * fuse into ConvBNActivation * make reset_parameters private * fix type errors * fix for unit test * add pretrained weights for 6 variant models, update docs Reviewed By: prabhat00155, NicolasHug Differential Revision: D31309546 fbshipit-source-id: 56e01f105279c3d3c5514607c23d2835896b3d03

* initial code * add SqueezeExcitation * initial code * add SqueezeExcitation * add SqueezeExcitation * regnet blocks, stems and model definition * nit * add fc layer * use Callable instead of Enum for block, stem and activation * add regnet_x and regnet_y model build functions, add docs * remove unused depth * use BN/activation constructor and ConvBNActivation * add expected test pkl files * allow custom activation in SqueezeExcitation * use ReLU as the default activation * initial code * add SqueezeExcitation * initial code * add SqueezeExcitation * add SqueezeExcitation * regnet blocks, stems and model definition * nit * add fc layer * use Callable instead of Enum for block, stem and activation * add regnet_x and regnet_y model build functions, add docs * remove unused depth * use BN/activation constructor and ConvBNActivation * reuse SqueezeExcitation from efficientnet * refactor RegNetParams into BlockParams * use nn.init, replace np with torch * update README * construct model with stem, block, classifier instances * Revert "construct model with stem, block, classifier instances" This reverts commit 850f5f3. * remove unused blocks * support scaled model * fuse into ConvBNActivation * make reset_parameters private * fix type errors * fix for unit test * add pretrained weights for 6 variant models, update docs

facebook-github-bot added the cla signed label Sep 14, 2021

datumbox self-requested a review September 14, 2021 07:49

datumbox mentioned this pull request Sep 14, 2021

[RFC] TorchVision with Batteries included - Phase 1 #3911

Closed

16 tasks

kazhang commented Sep 15, 2021

View reviewed changes

kazhang marked this pull request as ready for review September 15, 2021 22:21

kazhang requested a review from fmassa September 15, 2021 22:22

datumbox reviewed Sep 15, 2021

View reviewed changes

vadimkantorov reviewed Sep 16, 2021

View reviewed changes