Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RegNet in torchvision ? #2655

Closed
blefaudeux opened this issue Sep 8, 2020 · 32 comments · Fixed by #4403 or #4530
Closed

RegNet in torchvision ? #2655

blefaudeux opened this issue Sep 8, 2020 · 32 comments · Fixed by #4403 or #4530

Comments

@blefaudeux
Copy link

blefaudeux commented Sep 8, 2020

🚀 Feature

Add RegNet trunks in torchvision

Motivation

RegNets were proposed in this paper https://arxiv.org/pdf/2003.13678.pdf, they're showing very interesting performance and speed. They have been open sourced already, but are not usable in a straightforward way for people used to having reference models in torchvision. Another implementation is available in ClassyVision (I'm a co-author of this one), but it does not cover all use cases.

Pitch

Start from the ClassyVision RegNet support and implement RegNets in torchvision.

Alternatives

Let users use RegNets from external implementations

Additional context

This has been discussed with @pdollar, one of the RegNet authors. CC @fmassa

cc @vfdev-5

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Sep 8, 2020

@blefaudeux thanks for the suggestion ! For which tasks you think about for this model: at least classification, right ?

Another implementation is available in ClassyVision (I'm a co-author of this one), but it does not cover all use cases.

Could you detail which use-cases ClassyVision implementation does not cover ?

Would you like to draft a PR for that ? Otherwise, me or someone else can do that.

@blefaudeux
Copy link
Author

blefaudeux commented Sep 8, 2020

@blefaudeux thanks for the suggestion ! For which tasks you think about for this model: at least classification, right ?

Another implementation is available in ClassyVision (I'm a co-author of this one), but it does not cover all use cases.

Could you detail which use-cases ClassyVision implementation does not cover ?

Oh, I just meant that not everyone is using ClassyVision obviously, I for instance came across users telling me that they were sticking to EfficientNets or ResNets because they were only wiling to consider Torchvision.

Would you like to draft a PR for that ? Otherwise, me or someone else can do that.

I'm not sure what there is to know for a model to be supported by torchvision, apart from the raw code (which I can handle indeed, or anybody else, no preference). Are there some licence pre-requisites, pre-trained models, authorship constraints (validation from the original authors ?), things like that ? I don't have that much time right now, so if the requirements are clear (or minimal :)) I can handle that starting from the implementation in ClassyVision, else if there's some know-how required I would gladly stay around to assist but not do it myself

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Sep 8, 2020

Are there some licence pre-requisites, pre-trained models, authorship constraints (validation from the original authors ?), things like that ?

Excellent question. I'd say we have to provide model's implementation and ImageNet pretrained weights. In the docstring we provide some information about the model, link to the paper etc. For example, MNasNet:

def mnasnet0_5(pretrained=False, progress=True, **kwargs):
"""MNASNet with depth multiplier of 0.5 from
`"MnasNet: Platform-Aware Neural Architecture Search for Mobile"
<https://arxiv.org/pdf/1807.11626.pdf>`_.
Args:

ImageNet pretrained weights are often coming from retraining, but there are cases when they were converted let's say from TF weights etc. If torchvision's implementation will be a copy from ClassyVision, maybe we can reuse their weights if there are ones...

@fmassa can you also comment out this question, please ?

I don't have that much time right now, so if the requirements are clear (or minimal :)) I can handle that starting from the implementation in ClassyVision, else if there's some know-how required I would gladly stay around to assist but not do it myself

No worries. I send a PR where I can copy and adapt the implementation from ClassyVision and you could check if we are correct etc.

@blefaudeux
Copy link
Author

ImageNet pretrained weights are often coming from retraining, but there are cases when they were converted let's say from TF weights etc. If torchvision's implementation will be a copy from ClassyVision, maybe we can reuse their weights if there are ones...

I could provide reference weights from a ClassyVision ImageNet training, fairly easy to reproduce (if we're ok to limit this to some members of the RegNet family, probably not all of them :)). Another option is to translate the weights in the PyCls repo/model zoo, but that's some work because the model definition is not exactly the same (even if the actual underlying architecture is, of course). CC @mannatsingh from Classy

@pdollar
Copy link

pdollar commented Sep 8, 2020

Yes, all model weights are available in pycls, not sure how easy it is to convert them to classy vision format. Link: https://github.com/facebookresearch/pycls/blob/master/MODEL_ZOO.md

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Sep 8, 2020

@blefaudeux @pdollar thanks for the details ! So, we prefer to have here certain families of RegNet implemented as in Classy than the ones in pycls format right ? I let you define those families you think the most interesting for the users.
If there are families we would like to include, but you do not have the weights, we can retrain them too.

It would be, probably, helpful for the future to add some info, here in torchvision, on the reason why we prefered one implementation (classy) to another (pycls) if they are a bit different...

@blefaudeux
Copy link
Author

@blefaudeux @pdollar thanks for the details ! So, we prefer to have here certain families of RegNet implemented as in Classy than the ones in pycls format right ? I let you define those families you think the most interesting for the users.
If there are families we would like to include, but you do not have the weights, we can retrain them too.

It would be, probably, helpful for the future to add some info, here in torchvision, on the reason why we prefered one implementation (classy) to another (pycls) if they are a bit different...

Sorry for the imprecision, I forgot that the context is clearly not trivial, trying to address that :

  • the pycls implementation is the original one, straight from FAIR (research), it exposes a lot of features which were mostly there for experimentation (architecture search) but are not useful once the best models in the family were found
  • the ClassyVision implementation was tentatively more production ready, a little easier to read, with all the code to reproduce the paper but not more
  • what I meant by "not all the regnets" in the above was that RegNets define a family of models, each of them being fully defined by a couple of coefficients, loosely comparable with ResNext for instance (change the width, depth, etc.. even if the scaling is a little more complex here). I just meant that one has to decide on a couple of models whose weights would be provided, for instance the models comparable to RN50, RN101, things like that, but covering each and every data point may not be needed. Additionally RegNets can scale to very big models for vision, 128GF or more, and training these would be very costly and probably not needed for torchvision users, or at least that was my assumption.

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Sep 8, 2020

@blefaudeux thanks for the explanation ! I see better the context. Yes, this sounds good !

I just meant that one has to decide on a couple of models whose weights would be provided, for instance the models comparable to RN50, RN101, things like that

If you have an idea of which RegNetX, RegNetY it would be good to provide with pretrained weights. Maybe, those from the paper's tables 5 and 6 :

  • REGNETX-3.2GF
  • REGNETX-6.4GF
  • REGNETX-12GF
  • REGNETX-4.0GF
  • REGNETX-8.0GF
  • REGNETY-400MF
  • REGNETY-600MF
  • REGNETY-800MF
  • REGNETY-1.6GF
  • REGNETY-4.0GF
  • REGNETY-8.0GF

What do you think ?

@blefaudeux
Copy link
Author

Looks good to me ! Probably worth it writing a small tool to transcribe the weights in between pycls and this more streamlined implementation, it's something I could do. Not sure whether @mannatsingh would have something handy around to help on that ?

@vfdev-5 vfdev-5 self-assigned this Sep 8, 2020
@mannatsingh
Copy link

@blefaudeux I don't have anything available to convert pycls weights to classy, unfortunately.
While you should be able to get similar results by training the models from scratch in Classy (we verified this for a few models), it might be easier to just convert the weights available in pycls.

@pdollar
Copy link

pdollar commented Sep 9, 2020

What is the motivation for only including a subset of the models? Is it because they have to be retrained?

The reason I ask is that the benefit of RegNets is they give good accuracy models across a wide range of flop regimes, as opposed to say ResNet which typically is only optimized for a narrow range of 4GF-12GF (ResNet50-ResNet152). On the other hand RegNets can be good at very small sizes (200MF) and very large sizes (32GF). The very small and very large models are potentially the most interesting (for say mobile and state-of-the-art research). So I would advocate including the full range of models if possible.

@pdollar
Copy link

pdollar commented Sep 9, 2020

It may be better to figure out how to convert pycls weights to classy vision weights. i don't know classy vision well, but i can't imagine it being that hard?!? (famous last words :P )

@blefaudeux
Copy link
Author

@pdollar it's not so much of an issue with Classy actually, it's just that some names have changed in between the pycls and classy implementation of the RegNets, I should be able to fix that by loading/mapping names/saving again. I'm just a bit wary of something really subtle there, but from a distance it should not be too hard indeed and probably the best thing to do

@fmassa
Copy link
Member

fmassa commented Sep 9, 2020

Hi,

I'm not sure we would be ready to add RegNet to torchvision yet.

We generally have a requirement of number of citations of the paper containing the model model before we include it in torchvision, similarly to what we do for PyTorch. We can reconsider this decision in 6 months.

Users can obtain RegNet variants by using PyTorch Image Models https://github.com/rwightman/pytorch-image-models ,

@mannatsingh
Copy link

mannatsingh commented Sep 9, 2020

Thanks for the context @fmassa .

@blefaudeux one more thing to clarify, in your original note, you mention -

Another implementation is available in ClassyVision (I'm a co-author of this one), but it does not cover all use cases.

From what I understand, torchvision's implementations are even more strict (even fewer configuration parameters allowed, if any) - @fmassa can correct me if I'm wrong. Also, you should be able to generate any RegNet using the Classy implementation, so I wouldn't want anyone reading this issue to get the wrong impression :)

@blefaudeux
Copy link
Author

@mannatsingh ah ok, that's not what I meant with does not cover all use cases. I just meant that not every user of torchvision is using classy vision obviously (MoCo randomly comes to mind, they support torchvision out of the box but that's all), so I just meant that having an implementation in Classy was not enough to make RegNets truly accessible to the pytorch ecosystem.

I did not know about PyTorch Image Models linked above by @fmassa, seems that I'm not the only one, but if it does the trick then why not. I personally think that the citation metric can easily be gamed, but that's probably not a good enough incentive.

@fmassa
Copy link
Member

fmassa commented Sep 10, 2020

We try to keep the model implementations fairly simple -- the space of configurations for a model is potentially infinite, and trying to expose too many options can make things very hard to understand for users.

I personally think that the citation metric can easily be gamed, but that's probably not a good enough incentive.

I agree that citations per se is not a perfect metric. But given the amount of research and activity around computer vision nowadays, with 100s of papers every year claiming SOTA, we need some metric to be able to define what should be in torchvision or not -- otherwise we will end up having 100s of models which are all respectively SOTA during their respective submission time, but being SOTA doesn't involve only architectural changes to the model but also to the training recipe.

We will be adding more information about what are the criteria for a model / op to be include in torchvision in the CONTRIBUTING.md file, and we have an issue tracking it in #2651 , thanks for the discussion, let me us know if there are anything that you disagree / would like to add to the discussion.

@fmassa
Copy link
Member

fmassa commented Nov 10, 2020

Let's keep this issue open for now to track RegNets

@fmassa fmassa reopened this Nov 10, 2020
@mathmanu
Copy link

ResNets are heavy in terms of compute. MobileNets have less compute, but heavy in terms of memory access (Depthwise layers). RegNets provide a good balance between these extremes - especially RegNetX. Highly recommend them. Whish to see them as part of torchvision.

@fmassa
Copy link
Member

fmassa commented Jun 14, 2021

@mathmanu we are going to be adding RegNets in torchvision in the coming months

@mathmanu
Copy link

mathmanu commented Jun 14, 2021

Thank you.

Note that RegNetY models are memory intensive due to the Squeeze-and-Exctication layers. Some of the advantage that RegNet provides will be removed by those layers. Doubling the memory transfer requirement to get a very small lift in accuracy is not a good tradeoff - especially for embedded devices.

Hence I am also looking forward to torchvision having models without Squeeze-and-Exctication, say RegNetX. If possible lite versions on MobileNetV3 as well - as a reference tensorflow/models provide such models - they call it either minimialistic or lite. References:
https://github.com/tensorflow/models/tree/master/research/slim/nets/mobilenet
https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet/lite

@fmassa
Copy link
Member

fmassa commented Aug 12, 2021

Thanks for the pointers @mathmanu !

cc @datumbox for thoughts if we should add the lite variants of MobileNetV3 or if they are already covered by the mobilenet_v3_small variant

@datumbox
Copy link
Contributor

It's straight forward to extend the existing implementation to support the lite/minimalist versions of MobileNetV3. I think the most appealing reason to support it is that the minimalist version uses ReLU instead of hard swish and this plays nice with quantization.

Based on the aforementioned references the minimalistic version exchanges 5.6 accuracy points for roughly a 23% speed improvement which is definitely non trivial. On the other hand, the specific versions are not described on the paper so the only reference for it is the official repo.

To answer the question on whether we should add it or not, I think if there is enough demand from the community to support them we can do it. I personally don't have a use-case where it is required and I think there are other architectures we should offer before considering the minimalist MobileNetV3 models but I could be wrong.

@pdollar
Copy link

pdollar commented Aug 23, 2021

1
2

Another major industry player using RegNets :-D
In other words, RegNets are running in a million cars or so!
Hope this is a useful datapoint for this discussion.

Source: https://www.youtube.com/watch?v=j0z4FweCy4M&t=2829s

@datumbox
Copy link
Contributor

datumbox commented Aug 23, 2021

@pdollar Thanks for the input. Agreed and it's on the top of our list to add it on TorchVision soon. It's part of the #3911 epic.

@kazhang
Copy link
Contributor

kazhang commented Aug 23, 2021

In the following weeks, I will be working on upstreaming RegNet from Classy Vision to TorchVision.

@pdollar
Copy link

pdollar commented Aug 23, 2021

Great to hear everyone, thank you :)

@mannatsingh
Copy link

@kazhang please feel free to get in touch me for any details / code reviews :)

@kazhang
Copy link
Contributor

kazhang commented Sep 29, 2021

Not done yet. I'm still training the rest of the models. I will add the pretrained weights in the following days.

@blefaudeux
Copy link
Author

Thank you @kazhang , great work !

@mathmanu
Copy link

mathmanu commented Oct 30, 2021

Converting an existing model to a lite model seems to be pretty easy. Once the model is created, just search through the model and replace torchvision.ops.misc.SqueezeExcitation with torch.nn.Identity. We should also replace torch.nn.Hardswich by torch.nn.ReLU. In addition if we want to replace torch.nn.ReLU6 with torch.nn.ReLU that's also easy by the same method.

It should it should be possible to create a utility function to transform any given model to a "lite" model.

@datumbox @fmassa What do you think? That would make several embedded friendly models available in torchvision.

We could also think about other transformations (for example replace heavy 3x3 non-grouped convolutions by 3x3-depthwise or grouped convolutions) - but we don't have to go that far in the initial implementation.

@datumbox
Copy link
Contributor

@mathmanu If you were do something like that, you will have to retrain from scratch. You might be able to get away with a ReLU6 to ReLU conversion with minimal accuracy loss but can't say it's going to be the same for the rest.

Personally I think such model surgeries are best served by writing custom code which meets your exact needs and utilizing PyTorch FX's replace_pattern to reduce the amount of code you write.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants