-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot get the reported MACs in paper #10
Comments
Just by calculating each layer's MACs and sum them up. For the T2T module, has two attention layers and one proj layer. Then you have 14 consecutive attention blocks, each one has 0.321 G MACs, in total 4.494G. So, for your model T2T-ViTt-14, the total MACs would be 1.386+0.175+ 0.043 + 4.494= 6.09 G. |
Hi, we have updated the new MACs and some new Top1-acc in our repo and Figure1, and we will update it to our next arxiv version. For MACs, the T2T-ViTt-14 has 6.1 G, T2T-ViTt-14 has 9.8 G, T2T-ViTt-14 has 15.0 G. In Figure1, we compare T2T-ViT-14, 19, 24 with ResNets and ViT, the MACs of T2T-ViTt-14 is 5.2 G. All results are given in the Table of T2T-ViT models. |
Great, thanks |
Hi. I am wondering that in your MACs calucuation of T2T module, did you ignore the MACs of such as During my FLOPs calcuation of ViT, I think the patch embedding Conv operation takes a considerable amount computation. I am wondering how do you calcuate the MACs of Another question, Can I recognize the MACs I reported in your repo same as FLOPs? I have seen many different issues about the MACs calcuation. Can you report some clear details about the calcuation methods in your repo? Thank you very much. Thanks for your great work. |
Hi,
I've calcuated the MACs of the model, and found it is not consistent with the paper reported.
If I understand correctly, The T2T-ViTt-14 model would have this T2T module and extra 14 original ViT blocks.
The MACs for that 14 depth-ViT blocks would be 0.321 x 14 = 4.494 G.
For the first token-to-token attention, you will calculate attention of 56x56 tokens, which is 3136 tokens, with feature dim=64.
Consider only getting the affinity matrix and getting the value would have MACs: 3136 * 3136 * 64 + 3136 * 3136 * 64 = 1.26 G,
which already adds up to 5.754 G, higher than the reported 5.2G.
My full calculation of the T2T-ViTt-14 model would be 6.09 G MACs. Can you tell me if I miscalculate something?
Best,
Haiping
The text was updated successfully, but these errors were encountered: