Cannot get the reported MACs in paper #10

happywu · 2021-02-04T20:18:18Z

Hi,

I've calcuated the MACs of the model, and found it is not consistent with the paper reported.

If I understand correctly, The T2T-ViTt-14 model would have this T2T module and extra 14 original ViT blocks.
The MACs for that 14 depth-ViT blocks would be 0.321 x 14 = 4.494 G.

For the first token-to-token attention, you will calculate attention of 56x56 tokens, which is 3136 tokens, with feature dim=64.
Consider only getting the affinity matrix and getting the value would have MACs: 3136 * 3136 * 64 + 3136 * 3136 * 64 = 1.26 G,
which already adds up to 5.754 G, higher than the reported 5.2G.
My full calculation of the T2T-ViTt-14 model would be 6.09 G MACs. Can you tell me if I miscalculate something?

Best,
Haiping

yuanli2333 · 2021-02-07T08:32:47Z

Hi,

Please refer to this issue. We use the script from to calculate FLOPs but which should be 2x large than MACs.

BTW, how did you calculate the MACs as 6.09G MACs for T2T-ViTt-14?

happywu · 2021-02-08T18:36:09Z

Just by calculating each layer's MACs and sum them up.

For the T2T module, has two attention layers and one proj layer.
The first one with num of tokens of 56 x 56 = 3136, hidden size 64. qkv embedding has (7x7x3)x192x3136=0.089G, calc attention matrix: 3136x3136x64 = 0.63G, calc weighted value: 3136x3136x64=0.63G, proj layer: 64x64x3136=0.013G. 2 layer mlp: 64x64x3136x2 = 0.026G. So, for the first attention layer, you will have 1.386G MACs.
Likewise, for the second one, you will have 0.175G MACs. For the last proj layer: 576 x 384 x 14x14 = 0.043G.

Then you have 14 consecutive attention blocks, each one has 0.321 G MACs, in total 4.494G.

So, for your model T2T-ViTt-14, the total MACs would be 1.386+0.175+ 0.043 + 4.494= 6.09 G.

yuanli2333 · 2021-02-18T09:45:55Z

Hi, we have updated the new MACs and some new Top1-acc in our repo and Figure1, and we will update it to our next arxiv version.

For MACs, the T2T-ViTt-14 has 6.1 G, T2T-ViTt-14 has 9.8 G, T2T-ViTt-14 has 15.0 G.

In Figure1, we compare T2T-ViT-14, 19, 24 with ResNets and ViT, the MACs of T2T-ViTt-14 is 5.2 G. All results are given in the Table of T2T-ViT models.

happywu · 2021-02-19T21:17:58Z

Great, thanks

Liuyang829 · 2021-10-19T09:37:11Z

Hi. I am wondering that in your MACs calucuation of T2T module, did you ignore the MACs of such as self.soft_split0 = nn.Unfold(kernel_size=(7, 7), stride=(4, 4), padding=(2, 2))?

During my FLOPs calcuation of ViT, I think the patch embedding Conv operation takes a considerable amount computation. I am wondering how do you calcuate the MACs of nn.Unfold.

Another question, Can I recognize the MACs I reported in your repo same as FLOPs?

I have seen many different issues about the MACs calcuation. Can you report some clear details about the calcuation methods in your repo?

Thank you very much. Thanks for your great work.

happywu closed this as completed Feb 19, 2021

yuanli2333 mentioned this issue Mar 24, 2021

How to calculate MACs/FLOPs? #37

Closed

Annbless mentioned this issue May 10, 2021

Fail to obtain the reported MACs for performer-based models. #47

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot get the reported MACs in paper #10

Cannot get the reported MACs in paper #10

happywu commented Feb 4, 2021 •

edited

Loading

yuanli2333 commented Feb 7, 2021

happywu commented Feb 8, 2021 •

edited

Loading

yuanli2333 commented Feb 18, 2021

happywu commented Feb 19, 2021

Liuyang829 commented Oct 19, 2021

Cannot get the reported MACs in paper #10

Cannot get the reported MACs in paper #10

Comments

happywu commented Feb 4, 2021 • edited Loading

yuanli2333 commented Feb 7, 2021

happywu commented Feb 8, 2021 • edited Loading

yuanli2333 commented Feb 18, 2021

happywu commented Feb 19, 2021

Liuyang829 commented Oct 19, 2021

happywu commented Feb 4, 2021 •

edited

Loading

happywu commented Feb 8, 2021 •

edited

Loading