Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RegNet Architecture in TorchVision #4403

Merged
merged 42 commits into from
Sep 29, 2021
Merged

Add RegNet Architecture in TorchVision #4403

merged 42 commits into from
Sep 29, 2021

Conversation

kazhang
Copy link
Contributor

@kazhang kazhang commented Sep 14, 2021

Resolves #2655

cc @datumbox

use_se=False, **kwargs)
return _regnet("regnet_x_32gf", params, pretrained, progress, **kwargs)

# TODO(kazhang): Add RegNetZ_500MF and RegNetZ_4GF
Copy link
Contributor Author

@kazhang kazhang Sep 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RegNetZ from paper Fast and Accurate Model Scaling and are available in Classy.
We would need ConvBNReLU before the last FC layer. I will address this in the next commit.

@kazhang kazhang marked this pull request as ready for review September 15, 2021 22:21
@kazhang
Copy link
Contributor Author

kazhang commented Sep 15, 2021

Publish to gather early feedback.
cc @fmassa @datumbox @mannatsingh

@kazhang kazhang requested a review from fmassa September 15, 2021 22:22
Copy link
Contributor

@datumbox datumbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kazhang thanks for the contribution!

You didn't specify if you wanted us to focus on the ML side or on the architectural side of the code. Given that in the past you have implemented this model on ClassyVision, below I focus only on high-level comments about the architecture. Hope it's OK.

My comments below can be grouped in 3 categories:

  1. I highlight whenever you introduce a block for which we have one or many copies. This is in order to discuss whether we want to share an existing module. In the past we favoured keeping the models with minimum inheritance so that people can copy-paste them and edit them easily, but this does not scale very well.
  2. I let you know when you use a different pattern/idiom than the one usually found in vision. All patterns have pros/cons, your usages are valid and tastes are highly subjective. So I think this will boil down to the question "Do we want to keep the entire codebase consistent?"
  3. General questions / clarifications.

Happy to provide more detailed input once we discuss these. Let me know your thoughts, happy to jump in a call to speed this up as well and later summarize in this PR.

torchvision/models/regnet.py Outdated Show resolved Hide resolved
torchvision/models/regnet.py Outdated Show resolved Hide resolved
torchvision/models/regnet.py Outdated Show resolved Hide resolved
torchvision/models/regnet.py Outdated Show resolved Hide resolved
torchvision/models/regnet.py Outdated Show resolved Hide resolved
torchvision/models/regnet.py Outdated Show resolved Hide resolved
torchvision/models/regnet.py Outdated Show resolved Hide resolved
Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a ton for the PR, this is looking good!

I did an initial pass and I have a few high-level comments.

Maybe we could have a joint chat with Vasilis to try and converge on some specific points brought by our comments, what do you think?

torchvision/models/regnet.py Outdated Show resolved Hide resolved
torchvision/models/regnet.py Outdated Show resolved Hide resolved
torchvision/models/regnet.py Outdated Show resolved Hide resolved
torchvision/models/regnet.py Outdated Show resolved Hide resolved
torchvision/models/regnet.py Outdated Show resolved Hide resolved
torchvision/models/regnet.py Outdated Show resolved Hide resolved
torchvision/models/regnet.py Outdated Show resolved Hide resolved
torchvision/models/regnet.py Outdated Show resolved Hide resolved
torchvision/models/regnet.py Outdated Show resolved Hide resolved
torchvision/models/regnet.py Outdated Show resolved Hide resolved
Copy link

@mannatsingh mannatsingh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High level comment for the group - do we want to support the following -

IMO at the very least we should support instantiating scaled models. Supporting general AnyNets might be out of scope for torchvision, I'm not sure (they're quite useful too). Answering these two questions is important to answer some of the comments in code currently!

cc @pdollar

torchvision/models/regnet.py Outdated Show resolved Hide resolved
torchvision/models/regnet.py Outdated Show resolved Hide resolved
torchvision/models/regnet.py Outdated Show resolved Hide resolved
@kazhang
Copy link
Contributor Author

kazhang commented Sep 21, 2021

High level comment for the group - do we want to support the following -

* [Fast and Accurate Model Scaling](https://arxiv.org/abs/2103.06877) proposes a way to scale models. Once a model is scaled, it cannot be defined by `w_a`, `w_m`, etc. anymore and instead needs to be specified by the stage widths, group widths, etc. which means the `RegNetParams` API does not suffice. We already have models like these being trained currently with multiple downstream applications (happy to share details internally). In Classy Vision we had an `AnyNetParams` API for this (https://github.com/facebookresearch/ClassyVision/blob/b9ea86b39cc469867aabb76ada33bf5a8530eb19/classy_vision/models/anynet.py#L364), which `RegNetParams` derived from.

* The [Designing Network Design Spaces](https://arxiv.org/abs/2003.13678) paper specified AnyNets as fairly general networks and `RegNetX`, `RegNetY`, `RegNetZ` are instantiations of these. Which is why we have multiple block types - a user can specify very general networks with these settings.

We could support both scaled and regular regnet model if we init the RegNetParams with lists of params and have a class method to compute list of params.

Regarding to various block type, I'm leaning towards no including unused block types. From engineering point of view, if a component is not actively used within the code base, it is likely to be ignored in regular maintenance and eventually abandoned. Additionally, it's much easier to add a component than deleting it in a BC manner.

Copy link
Contributor

@datumbox datumbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates @kazhang. I think we are getting very close, I really like the changes.

I left couple of comments, feel free to ignore the nits as we can address them optionally on a separate PR. Let me know what you think.

torchvision/models/regnet.py Outdated Show resolved Hide resolved
torchvision/models/regnet.py Outdated Show resolved Hide resolved
torchvision/models/regnet.py Outdated Show resolved Hide resolved
torchvision/models/regnet.py Outdated Show resolved Hide resolved
torchvision/models/regnet.py Outdated Show resolved Hide resolved
torchvision/models/regnet.py Outdated Show resolved Hide resolved
torchvision/models/regnet.py Outdated Show resolved Hide resolved
torchvision/models/regnet.py Outdated Show resolved Hide resolved
torchvision/models/efficientnet.py Outdated Show resolved Hide resolved
torchvision/models/regnet.py Outdated Show resolved Hide resolved
@mannatsingh
Copy link

mannatsingh commented Sep 22, 2021

We could support both scaled and regular regnet model if we init the RegNetParams with lists of params and have a class method to compute list of params.

Regarding to various block type, I'm leaning towards no including unused block types. From engineering point of view, if a component is not actively used within the code base, it is likely to be ignored in regular maintenance and eventually abandoned. Additionally, it's much easier to add a component than deleting it in a BC manner.

Agree on both accounts! Thanks Kai!

Copy link
Contributor

@datumbox datumbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @kazhang.

The comments I left are optional nits or discussion points, none of them are blocking. IMO you should be able to merge this once you add the weights.

What's the plan for the pre-trained models? There are lots of new variants being added, do you foresee having weights for all of them prior the release? Alternatively, do you propose releasing the methods without the weights (it's not common but it has been done before)? Another option is to offer builder methods only for models that have weights.

if block_type is None:
block_type = ResBottleneckBlock
if activation is None:
activation = nn.ReLU
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @kazhang's earlier comment that this looks unnecessarily busy. It's true that in most places in vision, the idiom used is to pass None for the default layer and then assign a value. That seems not great IMO.

@fmassa any specific historical reason why we did this? Any concerns adopting Kai's proposal and do:

        stem_type: Callable[..., nn.Module] = SimpleStemIN,
        block_type: Callable[..., nn.Module] = ResBottleneckBlock,
        norm_layer: Callable[..., nn.Module] = nn.BatchNorm2d,
        activation: Callable[..., nn.Module] = nn.ReLU,


current_width = width_out

self.trunk_output = nn.Sequential(OrderedDict(blocks))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: In other models this is called features

self.trunk_output = nn.Sequential(OrderedDict(blocks))

self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
self.fc = nn.Linear(in_features=current_width, out_features=num_classes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: If some models (not all), we have a classifier attribute (see here).

regnet_x_3_2gf = models.regnet_x_3_2gf(pretrained=True)
regnet_x_8gf = models.regnet_x_8gf(pretrained=True)
regnet_x_16gf = models.regnet_x_16gf(pretrained=True)
regnet_x_32gf = models.regnet_x_32gf(pretrained=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before merge remember to remove from this list all models for which we don't offer weights. :)

@datumbox
Copy link
Contributor

@kazhang I did a final check and the PR looks good to go for me. We can follow up for the nits and todos on another PR.

Two quick questions:

  • Your implementation contains classes such as SimpleStemIN, BottleneckTransform etc which I understand are not declared on the __all__ variable. This is a very common pattern in TorchVision BTW. I assume that you consider these classes not part of the public API but still made it not fully private (by prepending an _) because other developers might be interested in using them. Can you confirm this is the case?
  • Have you validated all deployed pre-trained weights by running the references script with the --test-only --pretrained arguments? This is a final step we usually do just prior merging to confirm the accuracies reported on docs are accurate and that we are good to go. See the bottom of the description of PR Add MobileNetV3 Architecture in TorchVision #3182 for an example.

@kazhang
Copy link
Contributor Author

kazhang commented Sep 29, 2021

@datumbox yes, SimpleStemIN, BottleneckTransform are common building blocks in RegNet, other developers might want to use them when exploring other variants of RegNet.

@kazhang
Copy link
Contributor Author

kazhang commented Sep 29, 2021

I've tested the pretrained model on ImageNet:

model Acc@1 Paper Acc@1 Acc@5 command
regnet_y_400mf 74.024 74.1 91.68 srun -p $TRAIN_QUEUE --cpus-per-task=96 --gpus-per-node=8 python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py --model regnet_y_400mf --batch-size 128 --test-only --pretrained
regnet_y_800mf 76.42 76.3 93.136 srun -p $TRAIN_QUEUE --cpus-per-task=96 --gpus-per-node=8 python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py --model regnet_y_800mf --batch-size 128 --test-only --pretrained
regnet_y_8gf 79.966 79.9 95.1 srun -p $TRAIN_QUEUE --cpus-per-task=96 --gpus-per-node=8 python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py --model regnet_y_8gf --batch-size 64 --test-only --pretrained
regnet_x_400mf 72.83 72.7 90.95 srun -p $TRAIN_QUEUE --cpus-per-task=96 --gpus-per-node=8 python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py --model regnet_x_400mf --batch-size 128 --test-only --pretrained
regnet_x_800mf 75.19 75.2 92.418 srun -p $TRAIN_QUEUE --cpus-per-task=96 --gpus-per-node=8 python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py --model regnet_x_800mf --batch-size 128 --test-only --pretrained
regnet_x_8gf 79.324 79.3 94.694 srun -p $TRAIN_QUEUE --cpus-per-task=96 --gpus-per-node=8 python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py --model regnet_y_8gf --batch-size 64 --test-only --pretrained

@datumbox datumbox mentioned this pull request Sep 29, 2021
5 tasks
@kazhang kazhang merged commit 194a084 into pytorch:main Sep 29, 2021
@github-actions
Copy link

Hey @kazhang!

You merged this PR, but no labels were added.

facebook-github-bot pushed a commit that referenced this pull request Oct 1, 2021
Summary:
* initial code

* add SqueezeExcitation

* initial code

* add SqueezeExcitation

* add SqueezeExcitation

* regnet blocks, stems and model definition

* nit

* add fc layer

* use Callable instead of Enum for block, stem and activation

* add regnet_x and regnet_y model build functions, add docs

* remove unused depth

* use BN/activation constructor and ConvBNActivation

* add expected test pkl files

* allow custom activation in SqueezeExcitation

* use ReLU as the default activation

* initial code

* add SqueezeExcitation

* initial code

* add SqueezeExcitation

* add SqueezeExcitation

* regnet blocks, stems and model definition

* nit

* add fc layer

* use Callable instead of Enum for block, stem and activation

* add regnet_x and regnet_y model build functions, add docs

* remove unused depth

* use BN/activation constructor and ConvBNActivation

* reuse SqueezeExcitation from efficientnet

* refactor RegNetParams into BlockParams

* use nn.init, replace np with torch

* update README

* construct model with stem, block, classifier instances

* Revert "construct model with stem, block, classifier instances"

This reverts commit 850f5f3.

* remove unused blocks

* support scaled model

* fuse into ConvBNActivation

* make reset_parameters private

* fix type errors

* fix for unit test

* add pretrained weights for 6 variant models, update docs

Reviewed By: prabhat00155, NicolasHug

Differential Revision: D31309546

fbshipit-source-id: 56e01f105279c3d3c5514607c23d2835896b3d03
cyyever pushed a commit to cyyever/vision that referenced this pull request Nov 16, 2021
* initial code

* add SqueezeExcitation

* initial code

* add SqueezeExcitation

* add SqueezeExcitation

* regnet blocks, stems and model definition

* nit

* add fc layer

* use Callable instead of Enum for block, stem and activation

* add regnet_x and regnet_y model build functions, add docs

* remove unused depth

* use BN/activation constructor and ConvBNActivation

* add expected test pkl files

* allow custom activation in SqueezeExcitation

* use ReLU as the default activation

* initial code

* add SqueezeExcitation

* initial code

* add SqueezeExcitation

* add SqueezeExcitation

* regnet blocks, stems and model definition

* nit

* add fc layer

* use Callable instead of Enum for block, stem and activation

* add regnet_x and regnet_y model build functions, add docs

* remove unused depth

* use BN/activation constructor and ConvBNActivation

* reuse SqueezeExcitation from efficientnet

* refactor RegNetParams into BlockParams

* use nn.init, replace np with torch

* update README

* construct model with stem, block, classifier instances

* Revert "construct model with stem, block, classifier instances"

This reverts commit 850f5f3.

* remove unused blocks

* support scaled model

* fuse into ConvBNActivation

* make reset_parameters private

* fix type errors

* fix for unit test

* add pretrained weights for 6 variant models, update docs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RegNet in torchvision ?
8 participants