-
-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TResNet implementation #121
Comments
@mrT23 Hi Tal, I noticed your models the other week, they definitely look interesting. There's an open PR here with a BlurPool implementation where I was discussing your work with the author due to the overlap (anti-aliasing).. #101 I would definitely merge a PR request with your models if it fits with the rest of the code here. Looks like it should be pretty straightforward as you say. My biggest concern is the InplaceABN dependency. I don't want that dependency making the rest of the repository unusable in Windows or requiring a hard dependency on the ABN module. So, if you can put a guard around the import to defer failure of not having inplaceabn installed from import time to model creation time that'd be great. something like...
|
Inplace-ABN works perfectly well in Windows. Just to make sure we are aligned on the commit, before i start working on it: if we are aligned, give me the ok sign and i will start working on the MR. |
@mrT23 yup, sounds good! The comments in that other PR were about merging tresnet with the existing ResNet, but that wouldn't make sense as you say. A separate self contained TResNet would be great. The inplace-ABN repository is explicit about it not working in windows, I guess it just means they never tested, good to know that's not actually an issue. I'd still like to keep that dep optional though, yes. |
@mrT23 After having a chance to go through your models in more detail and run a few tests I have a few questions, mostly with regards to performance differences with JIT / no JIT and your global avg pool... overall solid model accuracy with good GPU throughput/mem utilization. I did some experiments with jit / no-jit for both the Downsample and SpaceToDepth layers. Within run to run measurement error I detected no img/s differences and no GPU memory utlization differences regardless of the combination of settings. The same for replacing the FastGlobalAvg pool with my own, no difference. Did you run tests on these and find any noteworthy differences for fwd or fwd+bwd in PyTorch for enabling JIT or the AvgPool? I compared on both 1080Ti Pascal cards with FP32 and Titan RTX with AMP enabled for my comparisons... One other small thing in your SE layer, any reason you used |
@rwightman, As always my answers are quite lengthy :) first, we can measure specifically the runtime of each module with/without JIT
results on my local machine:
So per module, JIT works. but does it give a contribution to the final throughput? to answer your question specifically:
As a final remark, i strongly recommend to measure also training throughput and maximal batch size. For daily use, they are more important than inference speed. it is quite easy to build a model with decent inference speed, but with terrible training speed and very low batch size (efficienetNet). Tal |
Hi @rwightman
my name is Tal, i am the main author of TResNet paper
https://github.com/mrT23/TResNet
I think you have an excellent repo, that really contains top implementations of the best models available today.
In my (biased) opinion, in terms of GPU speed-accuracy tradeoff, TResNet beats any model you currently have :-)
would you be interested that i will open a merge request for adding TResNet to your models' list ?
the model files are self-contained, and the merge request would be quite easy.
thanks,
Tal
The text was updated successfully, but these errors were encountered: