Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot get the reported MACs in paper #10

Closed
happywu opened this issue Feb 4, 2021 · 5 comments
Closed

Cannot get the reported MACs in paper #10

happywu opened this issue Feb 4, 2021 · 5 comments

Comments

@happywu
Copy link

happywu commented Feb 4, 2021

Hi,

I've calcuated the MACs of the model, and found it is not consistent with the paper reported.

If I understand correctly, The T2T-ViTt-14 model would have this T2T module and extra 14 original ViT blocks.
The MACs for that 14 depth-ViT blocks would be 0.321 x 14 = 4.494 G.

For the first token-to-token attention, you will calculate attention of 56x56 tokens, which is 3136 tokens, with feature dim=64.
Consider only getting the affinity matrix and getting the value would have MACs: 3136 * 3136 * 64 + 3136 * 3136 * 64 = 1.26 G,
which already adds up to 5.754 G, higher than the reported 5.2G.
My full calculation of the T2T-ViTt-14 model would be 6.09 G MACs. Can you tell me if I miscalculate something?

Best,
Haiping

@yuanli2333
Copy link
Collaborator

Hi,

Please refer to this issue. We use the script from to calculate FLOPs but which should be 2x large than MACs.

BTW, how did you calculate the MACs as 6.09G MACs for T2T-ViTt-14?

@happywu
Copy link
Author

happywu commented Feb 8, 2021

Just by calculating each layer's MACs and sum them up.

For the T2T module, has two attention layers and one proj layer.
The first one with num of tokens of 56 x 56 = 3136, hidden size 64. qkv embedding has (7x7x3)x192x3136=0.089G, calc attention matrix: 3136x3136x64 = 0.63G, calc weighted value: 3136x3136x64=0.63G, proj layer: 64x64x3136=0.013G. 2 layer mlp: 64x64x3136x2 = 0.026G. So, for the first attention layer, you will have 1.386G MACs.
Likewise, for the second one, you will have 0.175G MACs. For the last proj layer: 576 x 384 x 14x14 = 0.043G.

Then you have 14 consecutive attention blocks, each one has 0.321 G MACs, in total 4.494G.

So, for your model T2T-ViTt-14, the total MACs would be 1.386+0.175+ 0.043 + 4.494= 6.09 G.

@yuanli2333
Copy link
Collaborator

Hi, we have updated the new MACs and some new Top1-acc in our repo and Figure1, and we will update it to our next arxiv version.

For MACs, the T2T-ViTt-14 has 6.1 G, T2T-ViTt-14 has 9.8 G, T2T-ViTt-14 has 15.0 G.

In Figure1, we compare T2T-ViT-14, 19, 24 with ResNets and ViT, the MACs of T2T-ViTt-14 is 5.2 G. All results are given in the Table of T2T-ViT models.

@happywu
Copy link
Author

happywu commented Feb 19, 2021

Great, thanks

@Liuyang829
Copy link

Hi. I am wondering that in your MACs calucuation of T2T module, did you ignore the MACs of such as self.soft_split0 = nn.Unfold(kernel_size=(7, 7), stride=(4, 4), padding=(2, 2))?

During my FLOPs calcuation of ViT, I think the patch embedding Conv operation takes a considerable amount computation. I am wondering how do you calcuate the MACs of nn.Unfold.

Another question, Can I recognize the MACs I reported in your repo same as FLOPs?

I have seen many different issues about the MACs calcuation. Can you report some clear details about the calcuation methods in your repo?

Thank you very much. Thanks for your great work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants