Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blur #101

Merged
merged 6 commits into from
May 2, 2020
Merged

Blur #101

merged 6 commits into from
May 2, 2020

Conversation

chris-ha458
Copy link
Contributor

Continuing my attempts at Issue #90,
I implemented a blur pool layer for future integration.
Although I wanted to copy Kornia's implementation I found that they tightly coupled max pooling with the downsampling. Also it had an internal dependency on for pyramid downsampling.

As the original implementation suggests three different strategies for antialiasing(each involving Max and average pooling and strided convolution), I had to decouple the code both from both max pooling and the internal dependency.

I ended up re-implementing the whole thing.
However, I did end up making several design choices.
First of all, I only implemented two most common binomial filters(3 and 5) for downsampling.
(Kornia seems to have implemented one while the original repo implements 7)

This is because the original paper says the most common meaning filters to be used would be the filters of 3 and 5, and since the code base I'm working up to(#90) tested only 3 and 5 (and settled for 3). It is not to hard to implement other filters should the need arise.

Secondly, I removed most options for several padding strategies since neither kornia, the original anti-aliasing paper nor #90 discusses the tradeoffs for each padding and just used the original reflective padding strategy.
I thought about the consequences of the various padding strategies but it seems that only the makes original one makes sense.

Currently the blurpool function is not used by ResNet as intended.

This is because further design choices are necessary.
Although potentially any and all maxpool, avg-pool and s=2 strided conv can be blurred with this,
the original paper and #90 show that it is both unnecessary and inefficient.

So my next step would be to implement this in the style of #90(certain downsampling strided convs only) and only exposing a boolean argument somewhere.

Next, it would be implemented as value argument to enable either Vanilla, #90 style, and the original paper style (all? strided convs and maxpools).

I guess I'd need to make some design decisions and then see how it works.

In the meantime I would like for you to look at the code and point out improvements or changes you would like to see before I move forward.

@rwightman
Copy link
Collaborator

@vrandme thanks for the submission, been focused on a few other things lately so a bit slow on this right now, I don't see any issues with what's there so far but ultimately (as you allude to), the success depends on the integration in an actual network.

What sort of testing have you done so far on this? I assume you did some basic checks inputing simple patterns where an inspection would confirm appropriate blur. Have you tried integrating it into a basic network and training?

As for the combination with the other layers (let's call it integration strategies), there was some relevant twitter chatter I was involved in a while back on the topic...
https://twitter.com/ducha_aiki/status/1216859128397750278
... and don't miss this branch of the same convo, https://twitter.com/evgeniyzhe/status/1218241324740202496

@chris-ha458
Copy link
Contributor Author

chris-ha458 commented Mar 22, 2020

I implemented the rest of the blur.
The option is exposed through a argument on the resnet function
defaults to not applied, blur ='max' applies it in the initial maxpool of the stem, blur = 'strided' for strided convolutions(assembled cnn style), blur = 'max_strided' for both (original paper style). (My push includes the documentation in the comments)
Sample untrained networks for ImageNet are available(resnetblur18 for original style and resnetblur50 for assembled cnn style) but I don't see any use for them aside for testing for internal consistency.
I would like to find a way to import the original weights, but copyright concerns and my general inability to port weights from another code base makes this difficult for me.

I did not spend too much time on how to best incorporate the blur pool to ResNets since in my view, the original authors and the researchers from clova-ai did more than enough testing already. I merely implemented the two best methods they have already found.

As for independent testing, the BlurPool2d function has been tested both manually and against the original implementation for several random inputs.

However, the fully implemented version is almost untested.
I closely examined the code, the outputted models (by print(model)), the output from pytorch summary, and output from random input.

This is far from enough testing obviously, but since I am unable to port the original weights for various reasons at this time I think this is as far as I can go at the moment.
The fact that the original code and weights only deal with ImageNet and not CIFAR does not help.

I think testing wise the following could be done:

  • find a way to port original weights and test if it works on my blurpool code integrated into your resnet code.
    (highly likely since the blurpool itself does not introduce independent parameters. Nonetheless whether it properly ports and runs correctly is important)

  • train from scratch(at least resnet18 and resnet50 should be done to exhaust major codepaths) and compare results with original paper
    (I just realized that the original github provides weights for resnet18 so maybe that testing could be tractable in the near future).

@rwightman
Copy link
Collaborator

@vrandme thanks, I'll see if I can get a train session running at some point against a somewhat recent non blur resnet comparison point

@rwightman
Copy link
Collaborator

rwightman commented Apr 1, 2020

@vrandme I fixed the bottleneck and ran a training session on resnetblur50 ... 79.288 top-1 without using the JSD loss, not bad. There is some other cleanup I want to do before merging, but definitely seems worth adding.

Also, noticed someone tweeting about this https://github.com/mrT23/TResNet yesterday, uses the AA blur, a stem modification, and the Inplace ABN for BN

@chris-ha458
Copy link
Contributor Author

I did read the paper and here are my thoughts.
Many parts are quite an invasive change. They won't be easy or even possible to add as a drop in argument option.
(for example: space to depth stem, mixed basic block and bottleneck blocks in place and fused BN and LRELU)
Most of their ideas are sound. I have much more to comment on that so I will make another issue on this repo discussing which parts might be integrated on resnet and what wouldn't be.

Getting back to the their Blur implementation, you can consider it as a further cutdown and optimized version of my code.
There are some arguments that you can put in the function, but blocked by asserts.
Only the b2(1,2,1) filter and stride of 2,reflect padding is implemented.
Cuda, half precision and JIT is enabled.

Pseudocode is in the paper as follows:
image

Actual code is contained here

Here is my conclusion:
The authors are aware and even mention the original assembled net paper.
Their testing and implementation is further evidence that the most basic b2 version with stride 2 (kernel size 3x3) is size and computation efficient with tangible accuracy and robustness benefits.

Unless you want to maintain your code base towards the maximally experiment-able direction,
I think just replacing my code with their implementation, or adjusting it to that effect would be the best direction for practicality considering speed and simplicity of code.
Their license is Apache 2.0 so no license concerns either.

When this is merged, I'll make a TResNet issue here discussing overall changes and low hanging fruits regarding this codebase.

@rwightman
Copy link
Collaborator

@vrandme I'm in no rush to replicate TResNet, just curious if you'd seen it since it's in the same lineage of ResNet experiments. I'm not sure if JIT will actually have an impact for the blur, but that's pretty easy to test.

Agreed that some of the changes are hard to integrate nicely in a on/off fashion that won't impact backwards weight compat. I was thinking of fiddling with the ABN impl, before reading this paper, may try a BnAct factory version of ResNet and see if I can maintain weight compat.

I made some changes to the blur filter generation in this PR so it works for any filter size. Also got the filters out of the state_dict without too much of a hack.

chris-ha458 and others added 6 commits April 7, 2020 21:12
Initial implementation of blur layer.
currently tests as correct against Downsample of original github
clean up code for PR
1. add ResNet argument blur=''
2. implement blur for maxpool and strided convs in downsampling blocks
@rwightman
Copy link
Collaborator

Haven't forgotten about this, since it impacts the main resnet class, wanted to tweak a few things, run full regression test of all the other dep models. I did try a filter size 5 run, was worse than the default 3

@chris-ha458
Copy link
Contributor Author

hmm considering that the recent TResnet merge included an anti aliasing layer,
where do you want to go with this? I feel like it is ripe for closing at this moment unless there are more you want to implement.

Although I feel like there is enough evidence that filter size 3 is the best compromise for the models and datasets that are widespread, if you would like to test more here are some pointers.

There is an obvious trade off between resource use (runtime and memory use) vs accuracy with increasing filter size.
What is not apparent is the Consistency Tradeoff.
It is not clear consistency(or robustness) has a positive relationship with filter size past 3.

All in all, I personally am content with the #121 pulled antialiasing layer with the JIT optimized code.

@rwightman rwightman merged commit 9590f30 into huggingface:master May 2, 2020
@chris-ha458 chris-ha458 deleted the blur branch May 3, 2020 11:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants