Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TResNet implementation #121

Closed
mrT23 opened this issue Apr 11, 2020 · 5 comments
Closed

TResNet implementation #121

mrT23 opened this issue Apr 11, 2020 · 5 comments

Comments

@mrT23
Copy link
Contributor

mrT23 commented Apr 11, 2020

Hi @rwightman
my name is Tal, i am the main author of TResNet paper
https://github.com/mrT23/TResNet

I think you have an excellent repo, that really contains top implementations of the best models available today.
In my (biased) opinion, in terms of GPU speed-accuracy tradeoff, TResNet beats any model you currently have :-)

would you be interested that i will open a merge request for adding TResNet to your models' list ?
the model files are self-contained, and the merge request would be quite easy.

thanks,
Tal

@rwightman
Copy link
Collaborator

@mrT23 Hi Tal, I noticed your models the other week, they definitely look interesting. There's an open PR here with a BlurPool implementation where I was discussing your work with the author due to the overlap (anti-aliasing).. #101

I would definitely merge a PR request with your models if it fits with the rest of the code here. Looks like it should be pretty straightforward as you say.

My biggest concern is the InplaceABN dependency. I don't want that dependency making the rest of the repository unusable in Windows or requiring a hard dependency on the ABN module. So, if you can put a guard around the import to defer failure of not having inplaceabn installed from import time to model creation time that'd be great.

something like...

try:
    from inplace_abn import InPlaceABN
    has_iabn = True
except ImportError:
    has_iabn = False

...

# model creation helper
_create_tresnet():
    if not has_iabn:
          raise "Please install InplaceABN from...."

@mrT23
Copy link
Contributor Author

mrT23 commented Apr 11, 2020

Inplace-ABN works perfectly well in Windows.
I wrote TResNet code on a Windows PC.
however, i understand your concern about additional dependencies, and i can add the protection easily.

Just to make sure we are aligned on the commit, before i start working on it:
i am talking about adding an extra file to timm/models, called 'tresnet.py'.
the file will be self-contained with all TResNet needs.
i know it might be nicer to add it to add TResNet to the existing resnet.py file, but in fact TResNet and ResNet are quite different, despite the similar name.

if we are aligned, give me the ok sign and i will start working on the MR.

@rwightman
Copy link
Collaborator

@mrT23 yup, sounds good! The comments in that other PR were about merging tresnet with the existing ResNet, but that wouldn't make sense as you say.

A separate self contained TResNet would be great.

The inplace-ABN repository is explicit about it not working in windows, I guess it just means they never tested, good to know that's not actually an issue. I'd still like to keep that dep optional though, yes.

@mrT23 mrT23 closed this as completed Apr 13, 2020
This was referenced Apr 13, 2020
@rwightman
Copy link
Collaborator

rwightman commented Apr 28, 2020

@mrT23 After having a chance to go through your models in more detail and run a few tests I have a few questions, mostly with regards to performance differences with JIT / no JIT and your global avg pool... overall solid model accuracy with good GPU throughput/mem utilization.

I did some experiments with jit / no-jit for both the Downsample and SpaceToDepth layers. Within run to run measurement error I detected no img/s differences and no GPU memory utlization differences regardless of the combination of settings. The same for replacing the FastGlobalAvg pool with my own, no difference.

Did you run tests on these and find any noteworthy differences for fwd or fwd+bwd in PyTorch for enabling JIT or the AvgPool? I compared on both 1080Ti Pascal cards with FP32 and Titan RTX with AMP enabled for my comparisons...

One other small thing in your SE layer, any reason you used x_se2 for two lines instead of just clobbering over x_se for all of them?

@mrT23
Copy link
Contributor Author

mrT23 commented Apr 28, 2020

@rwightman, As always my answers are quite lengthy :)
I got asked a similar question in my mail.

first, we can measure specifically the runtime of each module with/without JIT
here is an example code for SpaceToDepth:

SpaceToDepthM = SpaceToDepth(block_size=4)
SpaceToDepthM_JIT = SpaceToDepthJit
input = torch.zeros(64, 3, 224, 224).cuda()

# warmup
tmp = SpaceToDepthM_JIT(input)
# SpaceToDepth regular
ts = time.time()
for i in range(20):
    tmp = SpaceToDepthM(input)
tf = time.time()
dt = (tf - ts) / 20 * 1e3
print("SpaceToDepth regular time [milli_seconds]: {:2f}".format(dt))

# warmup
tmp = SpaceToDepthM_JIT(input)
# SpaceToDepth regula
ts = time.time()
for i in range(20):
    tmp = SpaceToDepthM_JIT(input)
tf = time.time()
dt = (tf - ts) / 20 * 1e3
print("SpaceToDepth JIT time [milli_seconds]: {:2f}".format(dt))

results on my local machine:

SpaceToDepth regular time [milli_seconds]: 0.199497
SpaceToDepth JIT time [milli_seconds]: 0.099719

So per module, JIT works. but does it give a contribution to the final throughput?
my approach for this answer is holistic - since it is not hard to write JIT code, do it everywhere you can ! it won't hurt, and it might give improvement.
As an analogy, in python-numpy, you should always prefer vectorized operations over direct loops. its just a good practice, that pays off in the long run.
From this holistic POV, TResNet_M has 25% more flops than ResNet50, yet it is faster and offers batch size twice as big. so as a whole, something works there.

to answer your question specifically:

  • i agree that SpaceToDepth, with or without JIT, probably a negligible impact on the final img/sec throughput.
  • if you replace FastGlobalAvg with nn.AdaptiveAvgPool2d, you will definitely see a decrease in the throughput. we have a lot of SE layers who use it.
  • x_se2 is my attempt to try and help PyTorch reuse memory, by defining a new variable when i change the input dimension. maybe it works, maybe not. did not hurt.

As a final remark, i strongly recommend to measure also training throughput and maximal batch size. For daily use, they are more important than inference speed. it is quite easy to build a model with decent inference speed, but with terrible training speed and very low batch size (efficienetNet).
i discuss it in the article.
Another two things that are usually forgotten when assessing an architecture: how easy it is to train it, and how well it transfers. TResNet excels on both.

Tal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants