-
-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TResnets #124
Comments
Hi @vrandme part1: Dedicated SE - again, didn't invent the wheel here or something here. but i do think i have some important optimizations to get better speed-accuracy tradeoff. In general, SE layers are in my opinion the biggest advancement in deep learning architecture in recent years. they give an excellent speed-accuracy tradeoff. i tried to replace SE by plain ECA and in Tresnet, and it lowered the scores a bit. might revisit it in the future Inplace-ABN - inplace ABN is a marvelous thing. if PyTorch had some sense, they would set it as their default option. with in place ABN you can use batch size twice as large. twice! block selection - i read RegNet but hadn't tried their models yet. i recommend to be very very cautious about it. it seems a lot less practical than the explosive headers. EfficientNet models also claimed to reinvent the wheel, and they turned out to be terrible models. to-be-continue |
SpaceToDepth - (2) you get an image, do SpaceToDepth, and then just do a bunch of repeated blocks. repeated (simple) blocks is an efficient and ultra-fast GPU design. regarding performances: hence i (strongly) don't see why SpaceToDepth would damage scores of image segmentation or detection pipelines. we are currently testing TResNet on detection. regarding interpretability, you might be right, i can see why the activation maps of SpaceToDepth can be harder to interprete. is interpretability really that important ? i have never used it |
@mrT23 First of all, thank you for the reply. I hope you realize that I know that developing, training, and deploying ImageNet scale models is not easy task and I respect and thank you for that. Antialiasing : I was unaware of its benefits in Transfer learning and Fine graining. I was more interested in Consistency/robustness. The fact that such benefits exist fit very nicely with your model (good for Transfer learning or fine graining on GPUs). Dedicated SE : thank you for sharing your anecdotal evidence on ECA. Although ECA is very efficient, It might be less versatile than SE since they have different mechanisms. ECA might have to increase channel widths and parameters to effectively compare with SE in many other yet to be tested scenarios, and then its back down the hyperparameter rabbit hole. There's nothing wrong with sticking with well accepted best practices, which you not only adopted but improved upon. inplace ABN is a marvelous thing. very very true. SpaceToDepth : .... "SpaceToDepth design is the future." I actually agree! to contextualize my opinions : In the end, even if the activation maps for SpaceToDepth is hard to interpret, if it proves to be effective (which I think it will), then visualization tools and approaches will grow to work with such a stem. |
Hi Chris One last insight regarding SE and ECA - anyway, i gave a talk last week about "The dark magic behind deep learning" that contains further insights regarding TResNet models, as well as other issues. all the best |
@mrT23
I didn't expect you to respond at all at first but now I'm getting awesome insights. |
is it possible to transfer an issue from one repo to another ? if so, i would be happy to bring it to https://github.com/mrT23/TResNet . Tal |
GitHub did eventually implement issues transfer but I believe it's still limited to repo in same org. Cut and paste of comments could be done but tedious. Often best to close issue in one repo and then create new one in another and paste link to old in comments for reference. Some interesting discussion here. A comment on ECA, for the overhead, it works quite well with ResNet like network architecture as you both know. I thought it might be a great fit for a lighter MBConv/InvertedResidual net like EfficientNet but it decreased the performance over a baseline with no SE. I tried all reasonable locations in the block too. |
thanks @rwightman please close the issue |
Dear Tal, |
@linhduongtuan probably best to ask those questions in TResNet official repository, and give it a star when you visit :) https://github.com/mrT23/TResNet |
@vrandme @mrT23 Interesting discussion. Btw, you folks can find the regnet models here: https://github.com/facebookresearch/pycls/blob/master/MODEL_ZOO.md |
I regret not being able to participate in the discussion laid out in #121 as it happened.
As it stands, the original author of TResnet @mrT23 was present.
Since this post is after the fact, with a TResnet(from the original author at that!) was pulled into here,
I just would like to lay my opinions on it and ask some questions that hopefully @mrT23 would be able to answer.
As summarized in the paper, I view that the fundamental contributions of TResnet boils down to the following
I will address these in increasing complexity.
5. Antialiasing (https://github.com/adobe/antialiased-cnns), is a well known and tested method of increasing accuracy and consistency. It was also used in assembled cnns
Dedicated SE : @mrT23 made great efforts to streamline and optimize Squeeze and Excite. I would like to find out how it fares against Efficient Channel Attention. In theory, ECA or (my own cECA) would be able to be optimized similarly to show better parameter and computational efficiency and accuracy.
As it is, Tresnet is not amenable to drop in replacements of attention, but it could be rendered as such easily.
Inplace-ABN.
I wonder if such a method could be applied to EvoNorm(https://arxiv.org/pdf/2004.02967.pdf).
considering EvoNorm is itself an attempt to gather activation layers and Normalization layers and search for them in an end to end manner, its possible that it is not necessary for EvoNorm.
I havent seen or conducted for myself enough testing on EvoNorm to tell for myself,
block selection.
The recent RegNet paper(https://arxiv.org/abs/2003.13678) showed that even for bottleneck layers a bottleneck of 1 (no bottleneck channel expansion) could be effective and uses such layers extensively to construct what is ostensibly a more simple resnet that is more effective.
However, it has only been tested (as per the original paper) in limited capacity without all the bells and whistles of modern CNNs so it remains to be seen what kind of performance it would show WITH all the bells and whistles.
Furthermore, while the RegNet paper compares the bottleneck block (1x1, 3x3 with or without expansion followed by another 1x1 with residual connections.) with the vanilla block(one 3x3 with or without residual connection), it is not a proper comparison.
The real Vanilla resnet block would have to have TWO 3x3 layers with a residual layer. Like TResnet.
It might be valuable to see what TResnets could do with all stages with basic blocks or bottlenecks without channel expansions. Such a comparison would require concomitant hyperparameter tuning to adjust channel widths and layer counts but maybe RegNet scaling might show valuable pointers.
The Space to Depth stem is valuable tool to increase GPU throughput. The fact that it maintains or even increases accuracy is cherry on top.
My concern is that SpaceToDepth is hard to visual conceptually. I fear that this might lead to it being difficult to visualize functionally. For example, visualizing intermediate layer activations is an important tool to understand why a model functions the way they do. I'm not sure how the initial non visually intuitive stem would affect following layers from an interpretability standpoint.
In a similar vein, I'm concerned that SpaceToDepth might hinder TResnet's ability to be integrated to image segmentation or detection pipelines for the above reason.
One of the reasons that EfficientNets have took so long to be utilized in many image detection frameworks to displace ResNets was that a meaningful feature extractor was difficult to code.
There have been some attempts (like the version that exists in this very repo) but the difficulty of such endeavor, alongside difficulties in GPU throughput and fragile training, made it hard to vanquish ResNets.
I would love if @mrT23 could provide insights to these issues.
To be frank, my interests faded in (T)ResNets after seeing the RegNet paper. I hope that the eventual code and model releases will rekindle it.
In the end, TResnets are insightful, powerful, effective and efficient models in their own right. My points come more from curiosity than criticism and I hope @mrT23 understand my appreciation for their work and efforts (especially wrt incorporating their contributions to this beneficial code base).
The text was updated successfully, but these errors were encountered: